Amazon is looking to advance its Generative AI capabilities by building state-of-the-art model inference solutions and infrastructure to benefit all Amazon businesses and customers.
Requirements
- 3+ years of non-internship professional software development experience
- Experience with software performance optimization
- Knowledge of Deep Learning and Transformer architectures
- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Experience with Large Language Model Inference
- Experience with GPU programming (TensorRT-LLM)
- Experience with Python, PyTorch, and C++ programming and performance optimization
Responsibilities
- Designing, developing, testing, and deploying high performance model inference capabilities, including but not limited to multi-modality, SOTA model architectures, latency, throughput, and cost.
- Collaborating closely with a team of engineers and scientists to influence overall strategy and define the team’s roadmap.
- Driving system architecture, spearheading best practices, and mentoring junior engineers.
- Consulting with scientists to get inspiration of emerging techniques and blending those into the roadmap.
- Designing and experimenting with new algorithms from public and internal papers, benchmarking the latency and accuracy of implementations.
- Implementing production grade solutions and seeing them through the deployments swiftly.
- Collaborating with other science and engineering teams to get things done properly.
Other
- Bachelor's degree in computer science or equivalent
- Experience with Trainium and Inferentia Development
- Ability to work in a highly collaborative and friendly team environment
- Willingness to hold the highest bar in operational excellence and support production systems
- Ability to create solutions to minimize the ops load