Waymo is looking for engineers with ML frameworks or ML systems expertise to help improve compute efficiency on both cloud and car for their autonomous driving software.
Requirements
- Proficient in distributed systems design with an understanding of ML efficiency.
- Experience with ML frameworks, including TensorFlow, JAX, XLA.
- Solid programming skills in Python and C++.
- Practical familiarity with profiling tools to uncover performance bottlenecks.
- Familiarity with ML frameworks like Pallas and Triton.
Responsibilities
- Develop new neural model architectures (e.g., sparse architectures), decoding strategies (e.g., speculative decoding), etc. for improving training/inference performance on modern TPU and GPU architectures.
- Improve accelerator FLOPS efficiency of ML workload, including improving compiler optimizations (e.g. XLA), authoring low-level kernels (e.g. Pallas, Triton, etc.) and enabling low-precision computation.
- Optimizing ML systems for high performance on TPUs and GPUs clusters, including reducing communication overhead and memory consumption, ensuring scalability and reliability across distributed environments.
- Evaluate and integrate open source community and Google SOTA technologies to enhance the performance and scalability of ML workloads.
- Promote best practices for distributed systems architecture and contribute to technical leadership within the team.
Other
- B.S. in Computer Science, Math, or 8+ years equivalent real-world experience.
- MS in Computer Science, Math