Klaviyo is looking to solve the problem of building, owning, and scaling features end-to-end from scratch, and breaking through any obstacle or technical challenge in their way, specifically in the context of their real-time and offline data analytics platform.
Requirements
- Hands on with Python and SQL, with experience in backend development
- Experience with distributed data processing frameworks such as Apache Spark and Flink
- Proven track record of designing and implementing scalable ETL/ELT pipelines, ideally using AWS services like EMR
- Strong knowledge of cloud platforms, particularly AWS (e.g., EMR, S3, Redshift), and optimizing data workflows in the cloud
- Experience with data pipeline orchestration tools like Airflow
- Familiarity with real-time data streaming technologies such as Kafka or Pulsar
- Understanding of data modeling, database design, and data governance best practices
Responsibilities
- Implement scalable, fault-tolerant data pipelines using distributed processing frameworks like Apache Spark and Flink on AWS EMR, optimizing for throughput and latency
- Design batch and real-time, event-driven data workflows to process billions of data points daily, leveraging streaming technologies like Kafka and Flink
- Optimize distributed compute clusters and storage systems (e.g., S3, HDFS) to handle petabyte-scale datasets efficiently, ensuring resource efficiency and cost-effectiveness
- Develop robust failure recovery mechanisms, including checkpointing, replication, and automated failover, to ensure high availability in distributed environments
- Optimize data storage and processing systems to handle petabyte-scale datasets efficiently, ensuring performance and cost-effectiveness
- Collaborate with cross-functional teams to deliver actionable datasets that power analytics and AI capabilities
- Implement data governance policies and security measures to maintain data quality and compliance
Other
- 4+ years of experience in software development, with at least 2 years focused on data engineering and distributed systems
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
- Strong communication skills with experience mentoring or leading engineering teams
- Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment
- You've already experimented with AI in work or personal projects, and you're excited to dive in and learn fast