Nuro is looking to accelerate the benefits of robotics for everyday life by building a world-class ML-first self-driving solution. The company needs to build scalable machine learning infrastructure and distributed training solutions to optimize and accelerate ML development.
Requirements
- Strong coding, software design, and debugging skills in Python or C++.
- Relevant exposure to the ML development life cycle and ML models.
- Experience in building a cloud-based distributed training platform.
- Experience profiling and optimizing performance bottlenecks for deep learning models and trainers.
- Experience with serving frameworks like Triton or VLLM
- Experience with model & data parallel training frameworks like PyTorch FSDP
Responsibilities
- Build up a model serving platform for efficient large-scale simulations and reinforcement learning (RL) training.
- Maintain observability and monitoring for critical services like ML training, data dumping, and deployment.
- Implement tools to track the model development lifecycle for an efficient deployment and evaluation process.
Other
- 2+ years of relevant work experience or an equivalent experience in Masters/PhD with 1+ years of relevant experience.
- You are highly productive, motivated, and are a strong team player.
- You thrive in complex, fast-paced environments and learn quickly by doing.