Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AI Cluster Test Automation Engineer

AMD

Salary not specified

Dec 23, 2025

Santa Clara, CA, US

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems.

Requirements

Languages: Python, C, C++, Linux Shell scripting.
Frameworks/Libraries: TensorFlow, PyTorch, ONNXRT
Tools: Prior experience with Linux, Docker, Kubernetes,SLURM, LLVM compilers
Good experience with complex computer systems used in AI, HPC deployments, backend network designs in RDMA clusters
Experience in validating complex AI infrastructure - GPUs, networking, ROCEv2, UEC, running benchmark tests like IBPerf benchmarking, RCCL/NCCL.
Experience with performance profiling of CPUs, GPUs and debugging complex compute, network, storage problems.

Responsibilities

Work with AMD’s architecture specialists to validate AI solutions for distributed training and inference workloads with AMD's ROCM software
Build cluster scale automation for distributed training and inference workloads
Reproduce field defects and develop appropriate tests to prevent future issues.
Design, develop and deploy testing tools and automation libraries necessary to perform testing.
Lead the adoption of tooling and industry best practices by means of advocacy and outreach to help our development communities level up.
Other duties as assigned

Other

Bachelor's Degree or higher in Computer Science or related quantitative field.
An advanced degree or equivalent practical work experience is a plus.
This role is not eligible for visa sponsorship.
Able to communicate effectively and work optimally with different teams across AMD.
Leadership skills to drive sophisticated issues to resolution.