Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Amazon Web Services (AWS) Logo

Software Engineer- AI/ML, AWS Neuron Distributed Training

Amazon Web Services (AWS)

$129,300 - $223,600
Sep 6, 2025
Cupertino, CA, US
Apply Now

Amazon Web Services (AWS) is looking for a Software Development Engineer II to build, deliver, and maintain complex products that delight customers and raise performance bars. The role specifically focuses on the AWS Neuron software stack for machine learning accelerators, aiming to develop, enable, and performance-tune a wide variety of ML model families, including large language models and diffusion models, on AWS Trainium and Inferentia silicon.

Requirements

  • Experience programming with at least one software programming language
  • Deep Learning industry experience
  • Preferred previous software engineer expertise with Pytorch/Jax/Tensorflow, Distributed libraries and Frameworks, End-to-end Model Training.
  • Experience training these large models using Python is a must.
  • FSDP, Deepspeed and other distributed training libraries are central to this

Responsibilities

  • This role will help lead the efforts building distributed training support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks.
  • This role will help tune these models to ensure highest performance and maximize the efficiency of them running on the customer AWS Trainium and Inferentia silicon and the TRn1 , Inf1 servers.
  • design fault-tolerant systems that run at massive scale
  • development, enablement and performance tuning of a wide variety of ML model families
  • create , build and tune distributed training solutions with Trn1
  • extending all of this for the Neuron based system is key

Other

  • 3+ years of non-internship professional software development experience
  • 3+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience
  • work safely and cooperatively with other employees, supervisors, and staff
  • adhere to standards of excellence despite stressful conditions
  • communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service