Microsoft's AI Frameworks team is working on enabling state-of-the-art large language model (LLM) training and inference through deep optimization across the full software and hardware stack for Microsoft's first-party (1P) AI accelerators.
Requirements
- Experience with modern large language model (LLM) serving technologies or distributed inference concepts.
- Exposure to deep learning frameworks such as PyTorch, TensorFlow, or ONNX.
- Exposure to GPU programming (CUDA) or other accelerator software stacks.
- Familiarity with performance analysis, debugging, or profiling tools.
- Coursework, internship, or project experience related to AI infrastructure, compilers, or distributed systems.
- Experience with coding in languages including, but not limited to, C, C++, or Python.
Responsibilities
- Contribute to the design and development of components in the AI software stack for Microsoft’s 1P accelerators.
- Implement features and optimizations under the guidance of senior engineers.
- Collaborate with hardware, compiler, and framework teams to enable efficient execution of LLM workloads.
- Debug issues, analyze performance gaps, and propose targeted improvements.
- Participate in design discussions, code reviews, and maintain high-quality software practices.
- Learn and stay up to date on emerging AI frameworks and accelerator technologies.
Other
- Strong problem-solving ability and willingness to work in a collaborative team environment.
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.