Microsoft is looking to innovate the latest Inference systems to propel its cloud growth and is seeking a Lead AI Software Architect to lead the SW architectural design, development, and deployment of the future AI inference infrastructure optimized for Microsoft's AI cloud.
Requirements
- 7+ years of industry experience, with at least 5 years in AI inference software stack development and architecture.
- 5+ years of experience in designing and optimizing software stacks for specialized AI hardware, including accelerators, GPUs, or custom ASICs.
- 3+ years of experience building infrastructure and identify the opportunities for end2end Perf/TCO optimization for business critical AI workloads
- 3+ years of experience with AI inference frameworks and compiler toolchains such as TensorRT, ONNX Runtime, MLIR, or similar.
- Familiarity with open source AI inference SW stacks like vLLM, Dynamo, sglang.
- Experience contributing to open source AI frameworks or compiler projects.
- Excellent understanding of hardware-software interaction, memory hierarchies, compute kernels, and data movement optimization.
Responsibilities
- Lead the SW architectural design, development, and deployment of the future AI inference infrastructure optimized for Microsoft’s AI cloud.
- Collaborate closely with hardware architecture, compiler, systems, simulation/perf optimization to ensure seamless integration and optimized performance.
- Define and execute strategies for inference , cost optimizations, workload balancing, and memory optimization.
- Mentor and guide the software engineering team, setting clear technical directions and providing architectural oversight.
- Evaluate, select, and integrate third-party libraries and open-source frameworks (e.g., TensorRT, TVM, PyTorch, ONNX) for optimized inference performance.
- Act as a technical liaison between hardware engineers and software teams to communicate requirements, constraints, and opportunities for co-design.
- Identify performance bottlenecks and opportunities to intersect future hardware and system roadmap planning, influencing strategic direction.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter
- Previous experience in leading the AI software stack for an early-stage hardware startup or novel hardware project.
- Publications, patents, or other recognized contributions in the field of AI inference software architecture or acceleration
- Exceptional leadership, communication, and collaboration skills with a proven track record of guiding technical teams.