NVIDIA is seeking a Senior Software Engineer to build the foundational infrastructure for Robotics Research, specifically focusing on Project GR00T, an initiative to create foundation models and full-stack technology for humanoid robots. The role will concentrate on compute infrastructure to support these advanced AI research efforts.
Requirements
- 12+ years of full-time industry experience in large-scale MLOps and AI infrastructure
- Experience with ML frameworks like PyTorch, JAX, or TensorFlow.
- Deep understanding of Kubernetes, experience with Ray
- Experience with data frameworks and standards like SQL, Apache Spark, LanceDB
- Experience of GPU acceleration and CUDA programming
- Strong programming skills in Python and a high-performance language such as C++ for efficient system development.
Responsibilities
- Develop mechanisms to launch and manage large compute jobs to support multi-modal foundation models for robotics. These will include data jobs, training jobs, evaluation jobs, and so forth.
- Optimize GPU and cluster utilization for efficient model training, fine-tuning, and evaluation on massive datasets.
- Develop robust observability tools and procedures for this compute infrastructure to ensure reliability and performance.
- Collaborate with researchers to integrate innovative compute technologies into scalable training and eval pipelines.
Other
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
- Demonstrated Tech Lead experience, coordinating a team of engineers and driving projects from conception to deployment
- Deep background at building and operating large-scale data infrastructure
- Strong experience and curiosity in frontier AI research
- The base salary range is 224,000 USD - 356,500 USD for Level 5, and 272,000 USD - 425,500 USD for Level 6.