NVIDIA is looking to solve the problem of delivering Bare Metal as a Service for AI workloads by hiring an SRE Engineer to manage and optimize infrastructure, ensuring systems operate flawlessly and efficiently.
Requirements
- Experience managing and optimizing data center environments, including server firmware, BIOS, and DPUs.
- Strong proficiency in Kubernetes, with a solid understanding of container orchestration and management.
- Expertise in networking concepts and troubleshooting, with a focus on maintaining network operations.
- Proficiency in programming languages such as Rust, Go, and Python.
Responsibilities
- Collaborating with a dynamic team to craft, develop, and maintain robust infrastructure solutions.
- Managing and optimizing server environments, including firmware, BIOS, and DPUs, to ensure peak performance.
- Implementing and maintaining Kubernetes clusters, ensuring seamless deployment of applications.
- Writing and maintaining scripts and software in Rust, Go, and Python to automate processes and improve system performance.
- Working closely with cross-functional teams to troubleshoot and resolve complex networking issues, ensuring uninterrupted service.
Other
- A bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
- A minimum of 3 years of relevant experience in a similar role.
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.