NVIDIA DGX Cloud organization is looking to build NVIDIA’s accelerated compute infrastructure, including software to assist in the rapid bring-up, operation, configuration, and trouble-shooting of compute hardware and networking equipment.
Requirements
- Demonstrated ability to write code in a mainstream systems programming language such as C, C++, Golang, or Rust.
- Demonstrated ability to design and implement maintainable APIs for consumers.
- Practical experience with asynchronous programming, type safety, threading models, state machines and data structures.
- Background of data persistence (SQL or similar).
- Understanding of secure communication protocols (mutual-TLS, IPsec, or similar).
- Knowledge of SRE principles (observability, SLOs, logging, etc.)
Responsibilities
- Work with NVIDIA internal customers.
- Design and build scalable software systems to manage NVIDIA’s cloud infrastructure.
- Participate in responses to real-time operational events.
- Build network and systems automation software for managing a multi-tenant cloud infrastructure.
- Participate in open-source communities of software we leverage and build.
- Present to internal stakeholders and NVIDIA leadership on roadmaps, vision, & demos.
Other
- 15+ years of experience with designing and building distributed software systems.
- BS/MS degree in Computer science or related areas (or equivalent experience).
- Track record of directly supporting systems with external customers, or demanding internal customers.
- Ability to work with other software engineers, product architects, and product managers as a collaborative team.
- Passion about code quality, testing, deployment efficiency/simplicity and bringing amazing products to market.