Building the future of supercomputing on Azure by delivering cutting-edge infrastructure for AI training, AI inferencing, and high-performance computing (HPC).
Requirements
- 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- 4+ years of experience in software design and development
- 2+ years of experience in HPC or Machine Learning
- 2+ years of experience with Deep Learning, AI Infrastructure, and accelerators
- 2+ years experience on Distributed Systems
- 2+ years experience on High Performance Computing / Machine Learning middleware and Communication Runtime
- 2+ years experience on Co-Designing Hardware-Software
Responsibilities
- Design and deliver next-generation infrastructure for Artificial Intelligence (AI) training, AI inferencing, and High-Performance Computing (HPC) on Azure
- Optimize performance and scalability for AI and Machine Learning (ML) workloads across diverse hardware architectures, interconnect types, and processor/accelerator technologies
- Develop and enhance communication runtimes and middleware for HPC, AI, and ML systems
- Apply expertise in distributed systems and parallel programming models to real-world HPC and AI workloads
- Utilize profiling tools to analyze, debug, and improve workload performance and scalability
- Define and implement end-to-end vertical solutions with continuous focus on performance and scalability
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Collaborate in a team committed to Microsoft values and fostering an inclusive work environment that drives innovation and cultural impact