Microsoft is looking to hire a Site Reliability Engineer II to help design, build, and run distributed services at global scale, ensuring reliability, performance, and security for services used by millions of customers.
Requirements
- 2+ years coding skills in languages such as C, Python and PowerShell.
- 3+ years coding skills in languages such as C, Python and PowerShell.
- Experience with monitoring, logging, and distributed systems troubleshooting.
- Knowledge or hands-on experience in AI/ML systems.
- 2+ years technical experience working with large-scale cloud or distributed systems.
- 1+ year(s) technical experience in software engineering, network engineering, or systems administration
- 2+ years technical experience in software engineering, network engineering, or systems administration
Responsibilities
- help design, build, and run distributed services at global scale
- use your software engineering skills to eliminate toil, improve system resiliency, and deliver meaningful telemetry
- Participate in design and code reviews to ensure services are reliable, scalable, and secure.
- Operate services through on-call rotations, incident response, and post-mortems.
- Partner with product teams to drive improvements in resiliency, cost efficiency, and performance.
- Develop automation to reduce manual operations and improve recovery time.
- Build and maintain observability (metrics, logs, traces) that drives data-driven engineering decisions.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Contribute to a blameless culture of learning through continuous improvement and knowledge sharing.
- As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.
- Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.