Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AMD Logo

AI Cluster Test Automation Engineer

AMD

Salary not specified
Dec 23, 2025
Santa Clara, CA, US
Apply Now

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems.

Requirements

  • Languages: Python, C, C++, Linux Shell scripting.
  • Frameworks/Libraries: TensorFlow, PyTorch, ONNXRT
  • Tools: Prior experience with Linux, Docker, Kubernetes,SLURM, LLVM compilers
  • Good experience with complex computer systems used in AI, HPC deployments, backend network designs in RDMA clusters
  • Experience in validating complex AI infrastructure - GPUs, networking, ROCEv2, UEC, running benchmark tests like IBPerf benchmarking, RCCL/NCCL.
  • Experience with performance profiling of CPUs, GPUs and debugging complex compute, network, storage problems.

Responsibilities

  • Work with AMD’s architecture specialists to validate AI solutions for distributed training and inference workloads with AMD's ROCM software
  • Build cluster scale automation for distributed training and inference workloads
  • Reproduce field defects and develop appropriate tests to prevent future issues.
  • Design, develop and deploy testing tools and automation libraries necessary to perform testing.
  • Lead the adoption of tooling and industry best practices by means of advocacy and outreach to help our development communities level up.
  • Other duties as assigned

Other

  • Bachelor's Degree or higher in Computer Science or related quantitative field.
  • An advanced degree or equivalent practical work experience is a plus.
  • This role is not eligible for visa sponsorship.
  • Able to communicate effectively and work optimally with different teams across AMD.
  • Leadership skills to drive sophisticated issues to resolution.