Waymo's Data Primary team needs to build and operate data infrastructure to power the perception foundation models for their autonomous driving technology, ensuring high-quality and diverse data for model training, evaluation, and production to improve mobility and safety.
Requirements
- C++ programming skills (required), with Python experience
- 5+ years of experience with large-scale data systems (e.g., Spark, Beam, or Dataflow), or PhD and 3+ years
- Background in ML data engineering, data curation, or active learning
- Experience building and maintaining end-to-end ML data pipelines
- Experience with vector search (Faiss, ScaNN) or RAG systems
- Knowledge of ML infra for foundation model training and evaluation
- Familiarity with data-centric AI (few-shot, fine-tuning, pre-training)
Responsibilities
- Design and improve large-scale data pipelines that serve ML and research teams
- Deploy and maintain data flows for reliable, high-quality dataset delivery
- Apply ML techniques (e.g., embeddings, active learning) for data search and curation
- Build tools for scalable data mining and offboard inference
- Work to accelerate new model deployment across cities and platforms
Other
- In this hybrid role, you will report to an engineering manager.
- Collaboration skills with model, infra, and product teams
- Background in autonomous systems or robotics
- MS/PhD or published work in ML or large-scale data systems