NVIDIA is seeking a Senior Software Engineer to help us develop distributed storage services for AI/ML. In this role you will work closely with the broader NVIDIA team to design and build a reliable, scalable, and efficient storage-as-a-service tailored to AI applications that can be deployed anywhere and scale without limitations. This service supports the whole NVIDIA critical business from graphics drivers to autonomous vehicles to deep learning frameworks.
Requirements
- Strong background in developing distributed systems involving Golang, Kubernetes, and Cloud Service Provider integrations
- Strong track record of delivering distributed services in a variety of distributed computing environments
- Experience in implementing storage services and interfaces to ensure scalable, high-performance, and reliable solutions
- Experience deploying, managing, and debugging applications in a Kubernetes environment
- architected, built, and deployed a distributed service that runs on large-scale clusters, multi-petabyte to exabyte in size, with millions of users
- skilled in building and delivering cloud services, with a specific focus on distributed systems
Responsibilities
- Leading the overall architecture and design of our distributed storage service optimized for AI/ML
- Develop and maintain distributed, robust and scalable Go programs deployed to state of the art open-source ecosystems, including Kubernetes.
- Develop and maintain user-space applications, containers, Go-bindings, and CLI tools.
- Building features for a distributed storage service to enhance availability and reliability for large-scale deployments
- Automating distributed storage service end-to-end, including deployment, management, and monitoring
- architected, built, and deployed a distributed service that runs on large-scale clusters, multi-petabyte to exabyte in size, with millions of users
- skilled in building and delivering cloud services, with a specific focus on distributed systems
Other
- Engaging and collaborating with NVIDIA Research, Computing, Product teams, cross-functional teams, and external customers to deliver Cloud services.
- History of ownership of product delivery from inception to support
- Great communication and presentation skills
- You have owned responsibility for all lifecycle stages of software development and delivery.
- Passionate about innovating and investing in groundbreaking technologies and interested in working with accelerated Computing environments such as GPU Direct Storage, DPU, and RDMA.