Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Software Engineer, ML Training and Inference Infrastructure

Rivian

$228,000 - $285,000

Dec 15, 2025

Palo Alto, CA, US

Rivian is looking to solve the problem of establishing a state-of-art ML infrastructure for training and inference of large autonomous driving models to directly impact safety critical self-driving features of their category defining vehicles

Requirements

Deep knowledge of PyTorch
Knowledge of model training framework (e.g. PyTorch Lightning, ray, etc.)
In-depth knowledge of transformer architecture and ways to accelerate the training and inference of transformer models
Experience of performing large scale distributed training of models
A track record of profiling models and doing detective work to improve model training and inference speed
Experience with CUDA or Triton language for writing custom ops
Knowledge of Nvidia TensorRT

Responsibilities

Optimize the performance of Deep Learning training workload on NVIDIA GPU systems on a large scale
Optimize the latency of model inference and model pre- and post-processing on onboard systems
Design, train, and deploy large deep learning models that can leverage the vast amount of labeled and unlabeled data

Other

PhD in CS/CE/EE, or equivalent, in industry experience
A track record of efficiently solving complex problems collaboratively on larger teams
Travel requirements not specified
Must be eligible to work in the United States
Rivian provides robust medical/Rx, dental and vision insurance packages for full-time employees