Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

nuro Logo

Technical Lead, ML Training Infrastructure

nuro

$222,775 - $333,925
Sep 3, 2025
Mountain View, CA, US
Apply Now

Nuro is seeking an experienced Technical Lead to work on their ML Training stack to optimize distributed training, job scheduling, model component libraries, and improve performance analysis tools, enabling models to train faster and more efficiently to accelerate their self-driving roadmap.

Requirements

  • 6+ years of professional or research experience in ML infrastructure, distributed training, or ML systems engineering.
  • Expertise in PyTorch; familiarity with TensorFlow, and experience optimizing training performance (e.g., host offloading, quantization, reduced-precision training).
  • Hands-on experience with CUDA, Triton, XLA, TPUs.
  • Familiarity with ML compilers, ONNX, and intermediate representations.
  • Experience with containerization (Docker), orchestration (Kubernetes), and ML pipeline tools (Airflow).
  • Practical experience with JAX.

Responsibilities

  • Help define the ML framework roadmap for the Training Infrastructure team.
  • Build and maintain a scalable, distributed training platform with an emphasis on efficiency, determinism, and reproducibility for large-scale training jobs.
  • Detect, diagnose, and resolve performance bottlenecks across training workflows, including input data pipelines and distributed training loops.
  • Optimize scheduling, training performance, resource utilization, and ensure consistent, reproducible model training outcomes.
  • Drive improvements in software quality that measurably raise reliability, efficiency, reproducibility, and determinism.

Other

  • Experience driving complex technical initiatives with stakeholder engagement.
  • Strong collaboration and communication skills, with a passion for exploring and promoting new approaches and technology.
  • Mentor and grow a high-performing team, fostering technical excellence and collaboration.