Microsoft's AI Frameworks team is looking to enable state-of-the-art large language model (LLM) training and inference through deep optimization across the full software and hardware stack for Microsoft's first-party (1P) AI accelerators.
Requirements
- Proficiency in C++ and/or Python, with solid understanding of software engineering fundamentals.
- Experience with modern large language model (LLM) serving technologies, including distributed execution and inference optimization.
- Experience with deep learning frameworks such as PyTorch, TensorFlow, or ONNX.
- Experience with GPU computing (CUDA programming, GPU kernel optimization, performance tuning).
- Familiarity with AI accelerator software stacks, graph compilers, or kernel libraries.
- Understanding of large-scale distributed training or inference systems for LLMs.
- Exposure to performance profiling and optimization tools.
Responsibilities
- Implement and optimize components of the AI software stack targeting Microsoft’s 1P AI accelerators.
- Collaborate with hardware, compiler, and model teams to develop high-performance solutions.
- Contribute to framework integration work for PyTorch and ONNX with custom hardware backends.
- Analyze performance bottlenecks and propose optimizations across framework, runtime, and hardware layers.
- Write clean, maintainable, and well-tested code, and participate in design/code reviews.
- Stay informed on emerging AI framework and accelerator technologies.
Other
- Strong problem-solving skills and ability to collaborate across teams.
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, or Python OR equivalent experience.