Microsoft is looking to build the Next Gen Scheduling & Optimization Platform to manage inferencing capacity for OpenAI models and other large-scale AI workloads across Azure, aiming to dynamically allocate resources, monitor usage, and rebalance capacity for massive efficiency gains.
Requirements
- 2+ years technical engineering experience with coding in languages including, but not limited to, C-Sharp, Go, or Python
- 2+ years of experience with distributed systems or cloud infrastructure
- 2+ years of experience with telemetry, metrics pipelines, or resource scheduling systems
- Familiarity with cloud platforms (Azure, AWS, GCP) and container orchestration (Kubernetes, Service Fabric)
- Exposure to GPU-based workloads, model serving, or AI infrastructure
- Experience working with real-time systems or high-throughput APIs
Responsibilities
- Design and implement scalable services for GPU scheduling, allocation, and optimization across diverse AI workloads.
- Build reliable orchestrations to monitor GPU usage near real time and drive automated rebalancing decisions.
- Integrate with fleet health dashboards and GPU lifecycle management systems to ensure reliability and performance.
- Collaborate with partner teams across Azure ML, AOAI, and Core AI to align architecture, APIs, and operational readiness.
- Contribute to platform evolution supporting new hardware and real-time inference APIs.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.