Rivian is looking to solve the problem of establishing a state-of-art ML infrastructure for training and inference of large autonomous driving models to directly impact safety critical self-driving features of their category defining vehicles
Requirements
- Deep knowledge of PyTorch
- Knowledge of model training framework (e.g. PyTorch Lightning, ray, etc.)
- In-depth knowledge of transformer architecture and ways to accelerate the training and inference of transformer models
- Experience of performing large scale distributed training of models
- A track record of profiling models and doing detective work to improve model training and inference speed
- Experience with CUDA or Triton language for writing custom ops
- Knowledge of Nvidia TensorRT
Responsibilities
- Optimize the performance of Deep Learning training workload on NVIDIA GPU systems on a large scale
- Optimize the latency of model inference and model pre- and post-processing on onboard systems
- Design, train, and deploy large deep learning models that can leverage the vast amount of labeled and unlabeled data
Other
- PhD in CS/CE/EE, or equivalent, in industry experience
- A track record of efficiently solving complex problems collaboratively on larger teams
- Travel requirements not specified
- Must be eligible to work in the United States
- Rivian provides robust medical/Rx, dental and vision insurance packages for full-time employees