Waymo's Data Primary team needs to build and operate the data infrastructure that powers Waymo's perception foundation models. This involves designing scalable pipelines to process petabytes of sensor data, enabling rare event mining, large-scale offline model deployment, and ensuring high-quality, diverse data for perception model training, evaluation, and production.
Requirements
- C++ programming skills (required), with Python experience
- 4+ years of experience with large-scale data systems (e.g., Spark, Beam, or Dataflow)
- Background in ML data engineering, data curation, or active learning
- Experience building and maintaining end-to-end ML data pipelines
- Experience with vector search (Faiss, ScaNN) or RAG systems
- Knowledge of ML infra for foundation model training and evaluation
- Familiarity with data-centric AI (few-shot, fine-tuning, pre-training)
Responsibilities
- Design and improve large-scale data pipelines that serve ML and research teams
- Deploy and maintain data flows for reliable, high-quality dataset delivery
- Apply ML techniques (e.g., embeddings, active learning) for data search and curation
- Build tools for scalable data mining and offboard inference
- Work to accelerate new model deployment across cities and platforms
Other
- Collaboration skills with model, infra, and product teams
- Background in autonomous systems or robotics
- MS/PhD or published work in ML or large-scale data systems