Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Waymo Logo

Staff Machine Learning Engineer, Computer Vision/VLM

Waymo

$238,000 - $302,000
Sep 16, 2025
Mountain View, CA, USA • San Francisco, CA, USA
Apply Now

Waymo is looking to improve its autonomous driving technology by creating the highest-fidelity, most comprehensive offboard perception autolabels at a massive scale, serving as the foundation for training and validating the AV stack.

Requirements

  • 5+ years of hands-on experience training and shipping deep learning models for computer vision tasks (e.g., detection, segmentation, video understanding) using Python and frameworks like PyTorch, JAX, or TensorFlow.
  • 1+ years of demonstrated experience working with large language models (LLMs) or vision-language models (VLMs) in areas such as fine-tuning, prompting, or Retrieval-Augmented Generation (RAG).
  • Strong software engineering fundamentals, including designing scalable and reliable systems.
  • Experience building and managing large-scale data processing pipelines for ML training.
  • Proven ability to work autonomously and lead complex technical projects in a fast-paced R&D environment.
  • Hands-on experience with Reinforcement Learning, especially RLHF, RLAIF, or applying RL to language/agentic tasks.

Responsibilities

  • Develop and train state-of-the-art computer vision / multimodal models (e.g., Gemini) to extract the rich semantic information (e.g., object attributes, scene properties, interaction dynamics) required by the AI agent.
  • Design and implement a scalable AI agent framework that integrates large foundation models (e.g., Gemini) with the outputs of our perception models and internal knowledge bases.
  • Develop and apply Fine-tuning and Reinforcement Learning (RL) techniques to create a 'data flywheel,' continuously improving the system's captioning and reasoning abilities through automated feedback.
  • Develop and prototype novel prompting strategies for Vision-Language Models (VLMs) to elicit complex, causal reasoning about driving scenarios.
  • Collaborate closely with the ML Infra, Perception, Behavior, and AI Foundation teams to define data requirements and integrate the captioning system into the broader ML development lifecycle.
  • Own the full system lifecycle, from advanced model development and prototyping to production deployment and scaling for massive data generation

Other

  • Master's degree in Computer Science, or a related technical field.
  • PhD in Computer Science, or a related technical field (preferred).
  • Publication record in top-tier AI conferences (e.g., NeurIPS, ICML, ICLR, CVPR) (preferred).
  • A track record of impactful cross-functional collaboration (preferred).
  • Ability to work autonomously and lead complex technical projects in a fast-paced R&D environment.