AWS Neuron is seeking a Software Development Engineer to lead and architect the next-generation model serving infrastructure for AWS Inferentia and Trainium machine learning accelerators, with a focus on large-scale generative AI applications, aiming to deliver high-performance, low-cost inference at scale.
Requirements
- Deep expertise in ML Frameworks/Libraries such as JAX, PyTorch, vLLM, SGLang, Dynamo, TorchXLA, TensorRT.
Responsibilities
- Architect and lead the design of distributed ML serving systems optimized for generative AI workloads
- Drive technical excellence in performance optimization and system reliability across the Neuron ecosystem
- Design and implement scalable solutions for both offline and online inference workloads
- Lead integration efforts with frameworks such as vLLM, SGLang, Torch XLA, TensorRT, and Triton
- Develop and optimize system components for tensor/data parallelism and disaggregated serving
- Implement and optimize custom PyTorch operators and NKI kernels
- Drive architectural decisions that impact the entire Neuron serving stack
Other
- Mentor team members and provide technical leadership across multiple work streams
- Collaborate with customers, product owners, and engineering teams to define technical strategy
- Author technical documentation, design proposals, and architectural guidelines
- Experience as a mentor, tech lead or leading an engineering team
- work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies.