Azure High Performance Computing and AI Platform (HPC/AI) group is the team behind Azure’s cloud offering that powers some of the most demanding and largest scale AI training and inference workloads in the industry. The virtual machine (VM) series that our team owns combine cutting edge GPUs and accelerators, as well as a state-of-the-art scale-out network infrastructure to enable these workloads. We collaborate with many Microsoft teams and our industry partners to design and bring up the underlying platform, and we build the software to expose this platform as an Azure service.
Requirements
- coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- 2+ years of experience in HPC or Machine Learning
- Machine Learning & AI Expertise
- Familiarity with ML concepts, AI infrastructure, and accelerators; experience with HPC/ML middleware and profiling/performance analysis tools.
- Systems & Virtualization
- Strong understanding of operating systems fundamentals, virtualization technologies, and distributed systems.
- Hardware-Software Co-Design
- Experience in co-designing hardware and software for optimized performance.
Responsibilities
- solving technical problems at all levels of the stack
- contributing to our codebases to enable new features
- working on architectural proposals
- deep technical work that primarily focuses on HW/SW interactions, device virtualization, and performance analysis of GPU workloads in VMs
- work with upper layers of the Azure infrastructure
- Evaluate and make recommendations that advance Azure infrastructure for AI and other GPU-based workloads.
- Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI).
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- passionate about quality, wants the customer to succeed and get things done.
- Maintains communication with key partners across the Microsoft ecosystem of engineers.
- Acts as a key contact for leadership to ensure alignment with partners' expectations.