NVIDIA is looking for an outstanding individual to join their platform SWQA team to develop and execute test plans for NVIDIA HGX/DGX/MGX platforms, ensuring reliability and quality of servers, OS, firmware, and CUDA SW stack.
Requirements
- Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
- Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
- Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking
- Experience in using AI development tools for test plans creation, test cases development and test cases automation
- Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
- Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) – huge plus
- Background in parallel programming ideally CUDA/OpenCL is a plus
Responsibilities
- Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
- Installing and testing various systems OS, server firmware and SW stack.
- Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
- Build, develop/debug server and OS level automation front-end and back-end framework and tests
- Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
- Work in an agile software development team with very high production quality standards.
- Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
Other
- outstanding individual who thrives in a diverse work environment, has outstanding interpersonal skills and possesses a strong sense of engagement and continuous process improvement.
- Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
- 5+ years proven experience; or master’s degree.
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.
- If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.