The company is looking to accelerate its efforts to build standalone data products that enable data teams and independent developers to create innovative solutions at massive scale.
Requirements
- Strong software development architecture and fundamentals for backend applications
- Solid understanding of browser rendering pipeline, web application architecture (auth, cookies, http request/response)
- Solid programming experience: strong grasp of object-oriented design and experience building applications using asynchronous programming paradigms (e.g., async/await, event loops, or concurrency libraries)
- Experience building crawlers
- Proficient in Linux / Unix command line utilities, Linux system administration, architecture, and resource management
- Experience evaluating data quality and maintaining consistently high data standards across new feature releases (e.g., consistency, accuracy, validity, completeness)
Responsibilities
- Use and develop web crawling technologies to capture and catalog data on the internet
- Support and improve our web crawling infrastructure
- Structure, define, and model captured data, providing semantic data definition and automate data quality monitoring for data that we crawl
- Develop new techniques to increase speed, efficiency, scalability, and reliability of web crawls
- Use big data processing platform to build data pipelines, publish data, and ensure the reliable availability of data that we crawl
- Work with our data product and engineering team to design and implement new data products with captured data, and enhance and improve upon existing products
Other
- Must thrive in a fast paced environment and be able to work independently
- Can work effectively remotely (able to be proactive about managing blockers, proactive on reaching out and asking questions, and participating in team activities)
- Strong written communication skills on Slack/Chat and in documents
- You are experienced in writing data design docs (pipeline design, dataflow, schema design)
- You can scope and breakdown projects, communicate and collaborate progress and blockers effectively with your manager, team, and stakeholders
- Degree in a quantitative discipline such as computer science, mathematics, statistics, or engineering