NVIDIA's DGX Cloud Kubernetes API Services team is looking to build and scale GPU-accelerated Kubernetes clusters to support NVIDIA AI, robotics, and scientific computing projects, ensuring end-to-end performance and a positive customer/developer experience.
Requirements
- Experience in building foundational SaaS systems at scale, such as API design, user management, or authentication and authorization flows
- Proficiency in Go and building Go services at scale
- Experience with deploying and maintaining services atop Kubernetes
- Experience writing automation with Kubernetes (i.e. Controllers, CustomResourceDefinitions, etc.)
- Background with AWS or GCP and related technologies like S3, GCS, RDS, etc.
- Ability to solve issues across multiple layers: infrastructure, Kubernetes, application runtime
- Experience working across multiple layers of cloud infrastructure such as CSP APIs, Terraform, Kubernetes, and custom controllers and automation atop
Responsibilities
- Help build out and scale customer-facing APIs and systems for the DGX Cloud Kubernetes Platform
- Work with the Runtime and Cluster Architecture teams to provide a complete GPU-accelerated Kubernetes clusters to a wide variety of NVIDIA initiatives
- Build platform services for other NVIDIA developers to bring their services to NVIDIA Kubernetes clusters
Other
- Be the voice of our customers to ensure they have a smooth experience to access the compute they need for the workloads they want
- Communicate effectively across a big organization, both within and outside the Kubernetes Platform organization
- Experience working on internal tools and services for large engineering organizations
- Background with user-facing APIs with a focus on customer and/or developer experience
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.