Apollo.io needs to maintain and operate its data platform to support machine learning workflows, analytics, and product offerings.
Requirements
- Experience in data modeling, data warehousing, APIs, and building data pipelines
- Deep knowledge of databases and data warehousing with an ability to collaborate cross-functionally
- Experience using the Python data stack
- Experience deploying and managing data pipelines in the cloud
- Experience working with technologies like Airflow, Hadoop, FastAPI and Spark
- Understanding of streaming technologies like Kafka and Spark Streaming
Responsibilities
- Develop and maintain scalable data pipelines and build new integrations to support continuing increases in data volume and complexity
- Develop and improve Data APIs used in machine learning / AI product offerings
- Implement automated monitoring, alerting, self-healing (restartable/graceful failures) features while building the consumption pipelines
- Implement processes and systems to monitor data quality, ensuring production data is always accurate and available
- Write unit/integration tests, contribute to the engineering wiki, and document work
- Define company data models and write jobs to populate data models in our data warehouse
- Work closely with all business units and engineering teams to develop a strategy for long-term data platform architecture
Other
- Customer driven: Attentive to our internal customers’ needs and strive to deliver a seamless and delightful customer experience in data processing, analytics, and visualization
- High impact: Understand what the most important customer metrics are and make the data platform and datasets an enabler for other teams to achieve improvement
- Ownership: Take ownership of team-level projects/platforms from start to finish, ensure high-quality implementation, and move fast to find the most efficient ways to iterate
- Team mentorship and sharing: Share knowledge and best practices with the engineering team to help up-level the team
- Agility: Organized and able to effectively plan and break down large projects into smaller tasks that are easier to estimate and deliver
- Speak and act courageously: Not afraid to fail, challenge the status quo, or speak up for a contrarian view
- Focus and move with urgency: Prioritize for impact and move quickly to deliver experiments and features that create customer value
- Intelligence: Learns quickly, demonstrates the ability to understand and absorb new codebases, frameworks, and technologies efficiently
- Bachelor's degree in a quantitative field (Physical/Computer Science, Engineering, Mathematics, or Statistics)
- 8+ years of experience as a data platform engineer or a software engineer in data or big data engineer