Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Software Development Engineer AI/ML, Inference Serving, AWS Neuron

Amazon Web Services

$151,300 - $261,500

Sep 19, 2025

Cupertino, CA, US

AWS Neuron is seeking a Software Development Engineer to lead and architect the next-generation model serving infrastructure for AWS Inferentia and Trainium machine learning accelerators, with a focus on large-scale generative AI applications, aiming to deliver high-performance, low-cost inference at scale.

Requirements

Deep expertise in ML Frameworks/Libraries such as JAX, PyTorch, vLLM, SGLang, Dynamo, TorchXLA, TensorRT.

Responsibilities

Architect and lead the design of distributed ML serving systems optimized for generative AI workloads
Drive technical excellence in performance optimization and system reliability across the Neuron ecosystem
Design and implement scalable solutions for both offline and online inference workloads
Lead integration efforts with frameworks such as vLLM, SGLang, Torch XLA, TensorRT, and Triton
Develop and optimize system components for tensor/data parallelism and disaggregated serving
Implement and optimize custom PyTorch operators and NKI kernels
Drive architectural decisions that impact the entire Neuron serving stack

Other

Mentor team members and provide technical leadership across multiple work streams
Collaborate with customers, product owners, and engineering teams to define technical strategy
Author technical documentation, design proposals, and architectural guidelines
Experience as a mentor, tech lead or leading an engineering team
work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies.