Waymo is looking to train and improve pre-trained models for their autonomous driving software, specifically for Perception and Planning, to be deployed into the Waymo Driver and potential future products. This involves tackling challenges in large-scale reinforcement learning (RL) and building scalable systems for compute, data, and environments to enhance model intelligence and alignment with human drivers.
Requirements
- Proficient in distributed systems design with an understanding of ML efficiency.
- Experience with ML frameworks, including TensorFlow, JAX, XLA.
- Solid programming skills in Python and C++.
- Practical familiarity with profiling tools to uncover performance bottlenecks.
- Familiarity with post-training frameworks like TS/REX, Tunix, TorchRL, TRL, etc.
Responsibilities
- Develop the core training system for adapting RL techniques to unprecedented scales and heterogeneous environments (i.e. CPU/GPU/TPU).
- Collaborate with cross-functional teams to integrate cutting-edge rollout strategies, policies, and RL algorithms (i.e. REINFORCE, DPO, PPO, etc.) into the system.
- Optimize the end to end RL training pipeline for efficient and scalable learners/actors, and low-latency distributed reply buffers for persisting data produced by the rollouts.
- Build robust evaluations, analyze experimental results and iterate quickly to improve model performance and training workflows.
- Stay current with the latest research in RL, Vision-Language-Action (VLA) models, and World models to inform and inspire new initiatives.
Other
- B.S. in Computer Science, Math, or 8+ years equivalent real-world experience.
- MS in Computer Science, Math