The Azure High Performance Computing and AI Platform (HPC/AI) group is looking to solve technical problems at all levels of the stack to design and deliver the next generations of their platform, enabling new features on their VMs, and working on architectural proposals to expand capacity and range of supported scenarios for AI training and inference workloads.
Requirements
- proven experience coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- Experience in HPC or Machine Learning
- technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, OR Python
- Familiarity with Machine Learning, AI Infrastructure
- Familiarity with Operating Systems fundamentals and virtualization technologies
- deep technical work that primarily focuses on hardware/software interactions, device virtualization, and performance analysis of GPU workloads in VMs
- work with upper layers of the Azure infrastructure software
Responsibilities
- Analyzes functionality, integration, and performance issues at various levels of the HW/SW stack on current and future generations of AI training platforms.
- Designs and codes solutions that improve functional correctness, stability and performance of AI training oriented VM offerings and related services.
- Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI).
- Applies metrics to drive the quality and stability of code, as well as appropriate coding patterns and best practices.
- Holds accountability as a Designated Responsible Individual (DRI), and collaborates with other engineers across products/solutions, working as on-call to monitor system/product/service for degradation, downtime, or interruptions.
- Contributing to our codebases to enable new features on our VMs
- working on architectural proposals
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Microsoft is an equal opportunity employer.
- If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.