Design, build, and maintain robust data pipelines on AWS, leveraging Python, SQL, Apache Spark, and Databricks to support autonomy-related features in manufacturing and vehicle systems. Focus on scalability, cost efficiency, and enabling downstream ML and analytics workflows.
Requirements
- 3+ years building and maintaining cloud-based data infrastructure, preferably on AWS (S3, Lambda, Glue, IAM).
- Proficient in Python, SQL, Apache Spark, and Databricks.
- Strong understanding of distributed data processing patterns — batch, streaming, event-driven.
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Familiarity with data governance, privacy compliance, and geospatial/telemetry data is a plus.
Responsibilities
- Develop low-latency, high-throughput pipelines to ingest telemetry and logs from vehicle and manufacturing systems.
- Collaborate cross-functionally to deliver clean, validated datasets for ML models, analytics, and safety evaluations.
- Optimize ETL processes, data quality checks, and monitoring for large-scale structured and semi-structured data.
- Implement observability, automation, and CI/CD workflows for data infrastructure (Airflow, Kubernetes, Terraform).
- Build Spark-based jobs on Databricks and scale pipelines for fleet and production growth.
Other
- Remote is also an option
- 6-12 Months Contract
- We need very strong candidate with excellent technical skills.