NewsBreak is looking to build and optimize its data acquisition infrastructure to support critical business needs.
Requirements
- Proficiency in mainstream web scraping technologies and frameworks/tools such as Scrapy, Selenium, and Puppeteer.
- Strong coding skills in at least one programming language, such as Python, Java, Go, or C++.
- Solid understanding of web technologies, including HTML, CSS, JavaScript, and web protocols (HTTP/HTTPS).
- Experience with distributed systems, data pipelines, and storage solutions is a plus.
- Familiarity with frontend development and dynamic content rendering techniques is preferred.
Responsibilities
- Design, develop, and maintain distributed web crawler systems, ensuring efficient data scheduling, scraping, parsing, and storage.
- Collect and process data from the internet and partner sources in compliance with website policies and legal regulations.
- Optimize crawler performance to handle large-scale data extraction with high efficiency and reliability.
- Solve complex technical challenges related to web crawling, including anti-crawling mechanisms, dynamic content rendering, and data quality assurance.
- Collaborate with cross-functional teams, including data scientists, backend engineers, and product managers, to meet diverse business data requirements.
- Stay updated with the latest web technologies and crawling frameworks to continuously improve system capabilities.
Other
- Bachelor’s degree or higher in Computer Science, Engineering, or a related field, with at least 2 years of experience in web crawling and data collection.
- Strong problem-solving skills, attention to detail, and the ability to work independently or as part of a team.
- Health, dental, and vision care for you and your family (100% coverage for employee)
- Top-tier 401(K) plan with company matching
- Paid time off and paid holidays
- FSA, HSA and commuter benefits programs
- Team activity budget
- Annual Base Pay Range $125,000—$221,000 USD