Annapurna Labs (AWS) is seeking a Software Development Engineer to build and optimize the AWS Neuron SDK, a core software layer for high-performance deep learning and GenAI workloads on Amazon's custom ML accelerators (Trainium and Inferentia). The role focuses on enabling cutting-edge LLMs to run faster, more efficiently, and at massive scale on AWS accelerators, addressing the challenges of large-scale AI acceleration and inference optimization.
Requirements
- 5+ years of software development experience (C++ or Python required)
- Strong fundamentals in ML systems, LLM inference, and model execution
- Experience profiling and optimising large ML models for performance
- Deep understanding of system performance, memory management, and parallel computing
- Hands-on experience with PyTorch, JAX, or similar frameworks
- Ability to design scalable ML infrastructure across software–hardware boundaries
- Experience with CUDA kernels, Triton-style programming, or performant kernels (CUTLASS, FlashInfer)
Responsibilities
- Build and optimise distributed inference support for PyTorch within the Neuron SDK
- Design high-performance kernels and hardware-specific ML optimisations
- Tune large-scale LLM inference for latency, throughput, and memory efficiency
- Analyse and optimise performance across generations of Trainium & Inferentia hardware
- Develop infrastructure to onboard and test diverse model architectures
- Implement advanced optimisation techniques (fusion, sharding, tiling, scheduling)
- Debug and profile ML workloads using deep systems instrumentation
Other
- Remote or Hybrid work location
- Collaborate across compiler, runtime, hardware, and applied science teams
- Work directly with customers to enable and optimise their models on AWS ML accelerators
- Influence future Neuron architecture and contribute to open-source integration
- Master’s degree in CS/ML or similar (Bonus experience)