Waymo is looking to improve the efficiency of its ML models, including Perception, Planner, Foundation Models, and Simulation, by implementing SOTA low-bit quantization techniques and optimizing computation speed and efficiency on multiple hardware accelerator platforms.
Requirements
- Experience with ML frameworks such as JAX or PyTorch
- Deep knowledge of quantization techniques and familiarity with SOTA literature
- Experience with Transformer-based models.
- Experience with Triton or other kernel languages
- Familiarity with CUDA and profiling tools
- Familiarity with quantization frameworks in JAX.
Responsibilities
- Implement SOTA low-bit quantization techniques
- Incorporate quantized kernels into models and evaluate performance
- Communicate with model owners to understand requirements and expectations
Other
- Enrolled in a Masters or PhD program in Computer Science, Robotics, or a similar technical field of study.
- This will be a hybrid onsite internship position.
- We will accept resumes on a rolling basis until the role is filled.
- To be in consideration for multiple roles, you will need to apply to each one individually - please apply to the top 3 roles you are interested in.