NVIDIA is looking to deliver a software platform that supports the lifecycle of Artificial Intelligence (AI) super compute infrastructure on Kubernetes, enabling AI services across the cloud.
Requirements
- Experience building and shipping services on Kubernetes
- Background with using and contributing to open-source projects
- Programming experience in a relevant language, e.g. Golang, Python
- Experience with a wide range of modern infrastructure tools and technologies
- Experience with Kubernetes Cluster API, Terraform, Tinkerbell, and other infrastructure tooling
- Practical experience with Azure, GCP, or AWS
- Familiarity with the CNCF and the tooling across the ecosystem as well as upstream contribution in open source projects
Responsibilities
- Develop software systems to support large scale deployments of cloud infrastructure
- Design and develop APIs to support Infrastructure as Code (IaC) automation and deployment workflows.
- Contribute to multiple source code projects to fulfill NVIDIA requirements with software services
- Automate the validation of software solutions with unit and integration tests
- Participate in the ownership and health of CI/CD pipelines from dev to production environments
- Collaborate with other specialists for feedback on proposed designs and product direction
- Support SRE teams with development support and collaboration with internal product teams on sophisticated distributed systems problems at scale
Other
- BS in Computer Science, Information Systems, Computer Engineering or equivalent experience
- 5+ years of proven experience in large scale software development
- Communicate design and quality strategy in written, visual, and oral formats
- Collaborated with teams to write software to support cloud services at scale
- Ability to work in a no blame environment and openly share successes and failures