Waymo is looking to scale its autonomous driving technology by optimizing model inference and training, and ensuring these advancements generalize across multiple platforms.
Requirements
Master’s degree or PhD in Computer Science, Engineering, or a related technical field
3+ years of experience in software development for neural model inference or neural model training, and 1+ years experience with neural model inference and training optimization on modern GPU/TPU architectures
5+ years experience in software development for real-time systems, ideally experience with real-time systems running on device (e.g., Waymo’s onboard system)
Proficiency in C++, Python, and modern deep learning toolkits like PyTorch or JAX
Passionate about low-level neural net optimization and willingness to learn new architectures and tools
Deep understanding of latency and quality tradeoffs as it applies to neural network architectures and practical experience making said tradeoffs
Responsibilities
Optimize neural model architectures and systems for high performance on multiple GPU and TPU platforms (e.g., onboard vs simulation platform)
Optimize neural model performance and overall system performance for systems with hard real-time constraints (Waymo’s onboard system)
Develop post-training algorithms (e.g., quantization), low-level optimizations (e.g., kernel optimization), etc. for improving inference speed and reducing inference memory consumption on modern GPU and TPU architectures
Develop new neural model architectures (e.g., sparse architectures), decoding strategies (e.g., speculative decoding), etc. for improving inference performance on modern GPU and TPU architectures
Optimize model training speed and efficiency for large models (often memory bound) and for fine-tuning (often i/o bound)
Collaborate with ML infra teams (inference frameworks, training frameworks), Onboard hardware and Simulation teams, and Alphabet’s research teams
Other
Master’s degree or PhD in Computer Science, Engineering, or a related technical field