Nuro is looking to improve its machine learning infrastructure and distributed training solutions to support the development of its autonomous driving technology.
Requirements
- Experience in building a cloud-based distributed training platform, which fully supports data and model parallelism.
- Experience investigating and optimizing training performance bottlenecks for deep learning models.
- Understanding of machine learning models and ML development life cycle.
- Experience with Tensorflow, Keras, Pytorch and Cuda kernel implementation.
- Experience with distributed training frameworks and strategies.
- Experience with cloud-based platforms.
- Experience with data and model parallelism.
Responsibilities
- Research and develop new distributed training frameworks and strategies to support training deep learning models with growing sizes.
- Improve model training speed by optimizing Tensorflow, Keras, Pytorch and Cuda kernel implementation.
- Engineer advanced tools to profile and monitor model training performance across all teams, to detect and triage training problems.
Other
- 5+ years of relevant work experience, or an equivalent experience in PhD.
- Base pay range is between $167,200 and $303,050.
- Annual performance bonus, equity, and a competitive benefits package.
- Equal opportunity employer and prohibits any form of workplace discrimination.