NVIDIA is looking to build foundational systems that drive its high-performance GPU infrastructure for DGX Cloud, requiring scalable cloud services that integrate with diverse systems and enable operational automation across global cloud operations.
Requirements
- Expertise in building scalable REST APIs backed by PostgreSQL-compatible data stores.
- Proficiency in programming languages such as Go, Java, or Python.
- Familiarity with modern JavaScript frameworks (e.g., React, Angular, Next.js).
- Expertise in cloud infrastructure (AWS, GCP, Azure, etc) and container technologies like Docker and Kubernetes.
- Expertise with high-scale distributed systems, including architectural patterns for APIs and data pipelines.
- Familiarity with Linux operating systems.
- Strong debugging and problem-solving skills in distributed environments.
Responsibilities
- Act as technical lead for a team of software engineers designing cloud services backed by databases and data warehouses.
- Design and develop RESTful APIs to ingest telemetry from AI datacenters.
- Build scalable cloud services for high-volume ingestion, processing, and storage of large datasets.
- Build and manage data pipelines for online and offline data storage.
- Collaborate across teams to codify business processes into scalable, self-measuring systems.
- Optimize the reliability and efficiency of cloud services and operations.
- Lead and ship impactful technical projects, ensuring quality and scalability at every stage.
Other
- At least 12+ years of industry experience
- Outstanding communication and collaboration skills, with a focus on solving complex operational challenges.
- A passion for delivering scalable and efficient cloud services.
- A track record of leading engineers to successful delivery and operations of high-performance cloud services at Internet scale.
- Experience operating NVIDIA datacenter GPUs.