Microsoft's Azure Hardware Systems and Infrastructure (AHSI) organization is looking for a Senior ML Research Engineer to innovate hardware designs and optimize LLM deployment for Microsoft's cloud growth.
Requirements
- 4+ years of combined experience, including 2+ years of industry experience in low-precision model optimization and quantization for LLM workloads
- Proficient with deep learning frameworks such as PyTorch, TensorFlow, TensorRT, and ONNX Runtime.
- In-depth understanding of Transformer and LLM architecture, including various model optimization techniques such as quantization, pruning, neural architecture search (NAS), knowledge distillation, sharding/parallelism, KV cache optimization, and FlashAttention.
- Hands-on experience in setting up large scale evaluation framework for SOTA LLMs, fine tuning of large models.
- Programming skills in Python, C, and C++.
- Hands-on experience implementing and optimizing low-level linear algebra routines and custom BLAS kernels would be a plus.
- Deep knowledge of mixed-precision arithmetic unit microarchitecture would be a plus.
Responsibilities
- Design and develop novel quantization techniques to enable efficient deployment of LLM inference and training in Microsoft’s Azure production environments.
- Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models.
- Analyze performance bottlenecks in state-of-the-art LLM architectures and drive performance improvements.
- Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations.
- Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI teams.
- Work cross-functionally with data scientists and ML researchers/engineers to align on model accuracy and performance goals.
- Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency.
Other
- Doctorate in relevant field OR equivalent experience.
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Experience publishing academic papers as a lead author or essential contributor.
- Experience participating in a top conference in relevant research domain.
- Excellent communication skills and a team-oriented mindset.