Advancing Foundational Observability within Azure Core to elevate existing standards and introduce innovations that set a new benchmark for reliability and resilience, enabling rapid, localized issue detection and resilient recovery at global scale.
Requirements
- 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- 3 years of experience designing and building backend distributed systems or cloud-scale services.
- 2 years of experience with service reliability engineering and incident management for mission-critical systems.
- 1 year of experience with observability (telemetry, logging, metrics, detection) and operational excellence.
- Familiarity with open-source frameworks and standards related to observability.
- Track record of improving system reliability and performance at scale.
Responsibilities
- Design and build solutions that deliver step-change improvements in telemetry, detection, and recovery across core infrastructure and foundational services.
- Collaborate across Azure and partner teams to integrate with existing systems while introducing modern approaches that maximize impact and efficiency.
- Leverage and contribute to open-source frameworks and communities.
- Collaborates with appropriate stakeholders to determine user requirements for a scenario.
- Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
- Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
- Acts as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriate.
Other
- Embody our culture and values
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- 3 years of experience collaborating with cross-teams and delivering high-quality solutions.
- Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.