Microsoft's AI Frameworks team is building the end-to-end software stack for Microsoft's first-party (1P) AI accelerators to enable state-of-the-art large language model (LLM) training and inference through deep optimization across the full software and hardware stack.
Requirements
- Experience in C++ and/or Python, with deep understanding of software design, debugging, and performance optimization.
- Hands on experience with modern large language model (LLM) serving technologies, including model partitioning, distributed execution, and inference optimization.
- Experienced designing and delivering complex, high-performance systems in production environments.
- Experience with deep learning frameworks such as PyTorch, TensorFlow, or ONNX.
- Experience with AI accelerator software stacks, including custom runtimes, graph compilers, kernel libraries, or device drivers.
- Experience with GPU computing, including CUDA programming, GPU kernel optimization, and performance tuning for large-scale AI workloads.
- Deep understanding of large-scale distributed training or inference systems for LLMs.
Responsibilities
- Design, implement, and optimize core components of the AI software stack targeting Microsoft’s first-party AI accelerators, including runtime, kernel libraries, and framework integration layers.
- Collaborate with hardware, compiler, and model teams to co-design solutions that maximize performance, efficiency, and reliability across the full AI stack.
- Develop performance-critical infrastructure to support inference of large language models (LLMs) at scale.
- Identify and address software bottlenecks, and drive end-to-end performance tuning and debugging across framework, runtime, and hardware layers.
- Work closely with partner teams across Azure, research, and product groups to align technical direction and deliver high-impact capabilities for real-world AI workloads.
- Participate in design reviews, code reviews, and architectural discussions to ensure high-quality and maintainable software.
- Stay current with advancements in AI frameworks, compiler technologies, and hardware acceleration, and bring relevant innovations into our software stack.
Other
- 3 days / week in-office
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Excellent cross-discipline collaboration skills; ability to work effectively with hardware, compiler, and ML model teams.
- Technical leadership and mentorship experience; ability to lead by influence and drive cross-team alignment.