Innovating the latest Inference systems to propel Microsoft's cloud growth and powering Microsoft's "Intelligent Cloud" mission.
Requirements
7+ years of industry experience, with at least 5 years in AI inference software stack development and architecture.
5+ years of experience in designing and optimizing software stacks for specialized AI hardware, including accelerators, GPUs, or custom ASICs.
3+ years of experience building infrastructure and identify the opportunities for end2end Perf/TCO optimization for business critical AI workloads
3+ years of experience with AI inference frameworks and compiler toolchains such as TensorRT, ONNX Runtime, MLIR, or similar.
Familiarity with open source AI inference SW stacks like vLLM, Dynamo, sglang.
Experience contributing to open-source AI frameworks or compiler projects.
Excellent understanding of hardware-software interaction, memory hierarchies, compute kernels, and data movement optimization.
Responsibilities
Lead the SW architectural design, development, and deployment of the future AI inference infrastructure optimized for Microsoft’s AI cloud.
Collaborate closely with hardware architecture, compiler, systems, simulation/perf optimization to ensure seamless integration and optimized performance.
Define and execute strategies for inference , cost optimizations, workload balancing, and memory optimization.
Mentor and guide the software engineering team, setting clear technical directions and providing architectural oversight.
Evaluate, select, and integrate third-party libraries and open-source frameworks (e.g., TensorRT, TVM, PyTorch, ONNX) for optimized inference performance.
Act as a technical liaison between hardware engineers and software teams to communicate requirements, constraints, and opportunities for co-design.
Identify performance bottlenecks and opportunities to intersect future hardware and system roadmap planning, influencing strategic direction.
Other
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
Previous experience in leading the AI software stack for an early-stage hardware startup or novel hardware project.
Exceptional leadership, communication, and collaboration skills with a proven track record of guiding technical teams.
Proficiency in C++, Python, and experience with low-level programming, performance optimization, and system-level integration.