Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AMD Logo

Senior Software Development Engineer, ML Training and Performance

AMD

$192,000 - $288,000
Aug 22, 2025
San Jose, CA, US
Apply Now

AMD is looking for an influential senior software engineer who is passionate about improving the performance of key applications and benchmarks.

Requirements

  • Experience with distributed training pipelines
  • Knowledgeable in distributed training algorithms (Data Parallel, Tensor Parallel, Pipeline Parallel, ZeRO)
  • Familiar with training large models
  • Experience with ML frameworks such as PyTorch, JAX, or TensorFlow.
  • Experience with distributed training and distributed training frameworks, such as DeepSpeed, Megatron-LM.
  • Experience with LLMs, recommendation, or computer vision, especially large models, is a plus.
  • Excellent Python programming skills, including debugging, profiling, and performance analysis.

Responsibilities

  • Train large models to convergence on AMD GPUs.
  • Improve the end-to-end training pipeline performance.
  • Optimize the distributed training pipeline and algorithm to scale out.
  • Contribute your changes to open source.
  • Stay up-to-date with the latest training algorithms.
  • Influence the direction of AMD AI platform.
  • Collaborate across teams with various groups and stakeholders.

Other

  • A Bachelor, Master's or Ph.D. degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field.
  • Strong communication and problem-solving skills.