The Azure CXP team's mission is to transform Microsoft Cloud customers into fans by analyzing and amplifying customer needs and driving the vision to improve Cloud quality, security, and reliability. The AzRel team is dedicated to making Azure the safest and most reliable cloud by applying a Site Reliability Engineering (SRE) approach to Azure's most critical services and products.
Requirements
- proven experience coding in languages including, but not limited to, C, C++, C, Java, JavaScript, or Python
- Familiarity with modern distributed software design patterns and cloud systems architecture, including microservices, containers, load balancing, queuing, caching.
- Experience coding in Python and/or C, following SOLID principles and leveraging unit/integration testing frameworks.
- Experience deploying cloud-native solutions using Azure or similar cloud service provider technologies.
- Experience in building, shipping, and operating reliable solutions.
- Experience with automated infrastructure provisioning and configuration using IaC tools (eg., Bicep, Terraform).
- Experience applying prompt engineering to optimize LLM-based workflows for summarization, classification, and decision support scenarios.
Responsibilities
- Contributes to defining system reliability goals through Service Level Objectives (SLOs) and enhancing production posture with targeted improvements in observability and operability (telemetry, alerting, incident/change management, safe deployment practices).
- Builds reusable automation and processes that help multiple teams meet their reliability goals.
- Works directly on product code to achieve reliability outcomes.
- Leverages AI to proactively detect anomalies, predict incidents, and automate operational workflows - scaling reliability efforts across complex systems.
- Supports the design and development of large-scale distributed software services and solutions.
- Delivers “best-in-class” engineering by ensuring services are modular, secure, reliable, testable, diagnosable, observable, and reusable.
- Applies cutting-edge AI tools and techniques to reduce operational toil and scale reliability engineering across complex systems.
Other
- customer-obsessed, AI-curious problem-solver who thrives in an inclusive, collaborative global team
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Microsoft is an equal opportunity employer.
- If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.