NVIDIA is seeking a Senior Software Engineer to join a new team building the foundational infrastructure for Robotics Research, focusing on compute infrastructure for Project GR00T, NVIDIA’s moonshot initiative at building foundation models and full-stack technology for humanoid robots.
Requirements
- Experience with ML frameworks like PyTorch, JAX, or TensorFlow.
- Deep understanding of Kubernetes, experience with Ray
- Experience with data frameworks and standards like SQL, Apache Spark, LanceDB
- Experience of GPU acceleration and CUDA programming
- Strong programming skills in Python and a high-performance language such as C++ for efficient system development.
Responsibilities
- Develop mechanisms to launch and manage large compute jobs to support multi-modal foundation models for robotics. These will include data jobs, training jobs, evaluation jobs, and so forth.
- Optimize GPU and cluster utilization for efficient model training, fine-tuning, and evaluation on massive datasets.
- Develop robust observability tools and procedures for this compute infrastructure to ensure reliability and performance.
- Collaborate with researchers to integrate innovative compute technologies into scalable training and eval pipelines.
Other
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience
- 5+ years of full-time industry experience in large-scale MLOps and AI infrastructure
- Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field
- Demonstrated Tech Lead experience, coordinating a team of engineers and driving projects from conception to deployment