Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

NVIDIA Logo

Senior Software QA Test Development Engineer

NVIDIA

$136,000 - $264,500
Sep 24, 2025
Santa Clara, CA, US
Apply Now

NVIDIA is looking for an outstanding individual to join their platform SWQA team to develop and execute test plans for NVIDIA HGX/DGX/MGX platforms, ensuring reliability and validation of servers, OS, firmware, and CUDA SW stack. This role involves driving root cause analysis for test failures, building automation frameworks, and collaborating with inter-groups to achieve solutions, contributing to NVIDIA's position as an AI Computing Company.

Requirements

  • Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
  • Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc…) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
  • Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc…), NLP and LLM benchmarking
  • Experience in using AI development tools for test plans creation, test cases development and test cases automation
  • Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
  • Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) – huge plus
  • Background in parallel programming ideally CUDA/OpenCL is a plus

Responsibilities

  • Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
  • Installing and testing various systems OS, server firmware and SW stack.
  • Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
  • Build, develop/debug server and OS level automation front-end and back-end framework and tests
  • Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
  • Work in an agile software development team with very high production quality standards.
  • Manage bug lifecycle and collaborate with inter-groups to drive for solutions.

Other

  • outstanding individual who thrives in a diverse work environment, has outstanding interpersonal skills and possesses a strong sense of engagement and continuous process improvement.
  • enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan development, track record in developing AI tools and NLP, DevOps, CI/CD experience to join our platform SWQA team.
  • Bachelor’s Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
  • 5+ years proven experience; or master’s degree.
  • AI related tools, LLM and NLP.