NVIDIA is building the best cloud offering for AI workloads and bringing its latest GPU technology to clients as managed services under the DGX Cloud umbrella, requiring scalable managed self-service APIs for easy access to NVIDIA products.
Requirements
- Solid technical foundation in distributed computing and storage, including substantial experience with all of the following: server systems, storage, I/O, networking, and system software
- 12+ years of platform engineering experience on large-scale production systems
- Kubernetes and IaC expertise as an engineer
- General shared storage knowledge such as NFS, LustreFS, GlusterFS, etc.
- Familiarity with system-level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO.
- Large-scale distributed system, HPC, ML and Training experience with Slurm and Kubernetes
- Deep knowledge of both software and hardware knowledge in HPC and ML infrastructure
Responsibilities
- As a part of the service team, build and design platforms for DGX Cloud services
- Figure out how to take best from HPC and Kubernetes and help us make the unified platform
- Work within the team of software engineers and product people as well as engineering teams across all of NVIDIA on DGX Cloud AI Compute services
- Write IaC code, work on Kubernetes, and help the team to design and implement release pipelines
- Collaborate to understand how to make the best use of GitOps and Pipelines
Other
- BS in Computer Science, Information Systems, Computer Engineering or equivalent experience
- Ability to understand and communicate complex designs, distributed infrastructure, and requirements to peers, customers, and vendors
- Applications for this job will be accepted at least until September 29, 2025.
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.
- As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.