AWS Neuron is looking to solve the problem of developing, enabling, and performance tuning of a wide variety of ML model families, including massive scale large language models, on the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1 and Inf1 servers.
Requirements
- Experience optimizing inference performance for both latency and throughput on large models using Python, Pytorch or JAX
- Experience with Deepspeed and other distributed inference libraries
- Strong software development using Python/C++ and ML knowledge
- Experience programming with at least one software programming language
- Experience with design patterns, reliability and scaling of new and existing systems
- Experience with full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations
- Experience with compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trn1
Responsibilities
- Development, enablement and performance tuning of a wide variety of ML model families
- Building distributed inference support into Pytorch, Tensorflow using XLA and the Neuron compiler and runtime stacks
- Tuning models to ensure highest performance and maximize the efficiency of them running on the customer AWS Trainium and Inferentia silicon and the TRn1 , Trn2 servers
- Designing and coding solutions to help drive efficiencies in software architecture
- Creating metrics, implementing automation and other improvements, and resolving the root cause of software defects
- Building high-impact solutions to deliver to our large customer base
- Participating in design discussions, code review, and communicating with internal and external stakeholders
Other
- 3+ years of non-internship professional software development experience
- 2+ years of non-internship design or architecture of new and existing systems experience
- Bachelor's degree in computer science or equivalent
- Ability to work cross-functionally to help drive business decisions with technical input
- Ability to work in a startup-like development environment