NVIDIA DGX Cloud organization is looking for software engineering talent to build NVIDIA’s accelerated compute infrastructure, including software for rapid bring-up, operation, configuration, and troubleshooting of compute hardware and networking equipment.
Requirements
- Demonstrated ability to write code in a mainstream systems programming language such as C, C++, Golang, or Rust.
- Demonstrated ability to design and implement maintainable APIs for consumers.
- Practical experience with asynchronous programming, type safety, threading models, state machines and data structures.
- Background of data persistence (SQL or similar).
- Understanding of secure communication protocols (mutual-TLS, IPsec, or similar).
- Knowledge of SRE principles (observability, SLOs, logging, etc.).
- Understanding of networking protocols such as IP, IPv6, BGP, HTTP, ICMP, tunneling protocols (VXLAN, Geneve, FoU, GRE), etc.
Responsibilities
- Design and build scalable software systems to manage NVIDIA’s cloud infrastructure.
- Building network and systems automation software for managing a multi-tenant cloud infrastructure.
- Write services and software that aligns with the broad architectural vision for the NVIDIA Cloud Platform, working with other teams to develop a robust and scalable system.
- Own your code - from development to commit to test to production, including operational support.
- Participate in responses to real-time operational events.
- Participate in open-source communities of software we leverage and build.
- Design and build scalable software systems to manage NVIDIA’s cloud infrastructure.
Other
- Work with NVIDIA internal customers.
- Present to internal stakeholders and NVIDIA leadership on roadmaps, vision, & demos.
- BS/MS degree in Computer science or related areas (or equivalent experience).
- 8+ years of experience with designing and building distributed software systems.
- Track record of directly supporting systems with external customers, or demanding internal customers.