NVIDIA is seeking to build the foundational infrastructure for Robotics Research, specifically focusing on data infrastructure for Project GR00T, a moonshot initiative at building foundation models and full-stack technology for humanoid robots.
Requirements
- Experience with ML frameworks like PyTorch, JAX, or TensorFlow.
- Experience with Kubernetes and Ray.
- Deep understanding of data frameworks and standards like SQL, Apache Spark, LanceDB.
- Experience of GPU acceleration and CUDA programming.
- Strong programming skills in Python and a high-performance language such as C++ for efficient system development.
- Experience with large-scale MLOps and AI infrastructure.
Responsibilities
- Design and maintain large-scale distributed data ETL and data management systems to support multi-modal foundation models for robotics.
- Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets.
- Implement scalable data loaders and preprocessors tailored for multimodal datasets, including videos, text, and sensor data.
- Develop robust observability tools and procedures for this data infrastructure to ensure reliability and performance.
- Collaborate with researchers to integrate cutting-edge data technologies into scalable training and eval pipelines.
Other
- Bachelor's degree in Computer Science, Robotics, Engineering, or a related field (or equivalent experience).
- 12+ years of full-time industry experience in large-scale MLOps and AI infrastructure.
- Master’s or PhD’s degree in Computer Science, Robotics, Engineering, or a related field;
- Demonstrated Tech Lead experience, coordinating a team of engineers and driving projects from conception to deployment;