Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Software Engineer - ML Training and Inference Infrastructure

Rivian

$228,000 - $285,000

Sep 11, 2025

Palo Alto, CA, USA

Rivian is looking to establish a state-of-art ML infrastructure for training and inference of large autonomous driving models and optimize their performance.

Requirements

Deep knowledge of PyTorch
Knowledge of model training framework (e.g. PyTorch Lightning, ray, etc.)
In-depth knowledge of transformer architecture and ways to accelerate the training and inference of transformer models
Experience of performing large scale distributed training of models
A track record of profiling models and doing detective work to improve model training and inference speed
Experience with CUDA or Triton language for writing custom ops
Knowledge of Nvidia TensorRT

Responsibilities

Optimize the performance of Deep Learning training workload on NVIDIA GPU systems on a large scale
Optimize the latency of model inference and model pre- and post-processing on onboard systems
Design, train, and deploy large deep learning models that can leverage the vast amount of labeled and unlabeled data

Other

PhD in CS/CE/EE, or equivalent, in industry experience
A track record of efficiently solving complex problems collaboratively on larger teams
Experience with edge computing systems