Enhancing the shopping experience on Amazon through the conversational capabilities of large language models.
Requirements
- Experience with high-performance computing (low latency and / or high throughput) or real-time systems using ML model
- Experience in ML DevOps practices and infrastructure as code
- Experience with Machine and Deep Learning toolkits such as MXNet, TensorFlow, Caffe and PyTorch
- Experience in ML optimization with GPU or other specialized chips (such as TPU and Neuron hardware family)
- Experience with LLM serving framework and engine such as vLLM, TRT-LLM and SGLang
- Knowledge of ML model optimization techniques (quantization, pruning, distillation)
Responsibilities
- architecting, designing, developing, and enhancing high-performance, test-driven code and recipe for large-language model inference that is scalable and maintainable
- create innovative solutions at scale, exploring new technological and scientific possibilities
- establish best practices that reduce latency and improve throughput for large-language model inference
- develop efficient inference optimization solutions at scale
- partnering with technical and business leaders in a collaborative environment to create value for our customers
- contribute to prioritization, estimation, and sprint planning activities
Other
- 3+ years of engineering team management experience
- 5+ years of engineering experience
- Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations
- Experience partnering with product or program management teams
- Experience managing a team of high calibre Software Engineers developing complex, world class, scalable software systems that have been successfully delivered to customers