Red Hat is looking to hire a Manager, Site Reliability Engineering (SRE) to lead a US-based team responsible for developing and operating Red Hat’s managed OpenShift service offering on Azure, ensuring service-level agreements (SLAs) are consistently met while driving automation, process improvements, and operational excellence.
Requirements
- Strong understanding of SRE practices, including automation, observability, and incident management.
- Experience with Red Hat OpenShift or Kubernetes-based platforms.
- Familiarity with managing cloud-native services across multiple public cloud providers.
- Hands-on technical expertise in Linux, containers, or distributed systems.
Responsibilities
- Develop and operate Red Hat’s managed OpenShift service offering on Azure.
- Ensure that service-level agreements (SLAs) are consistently met.
- Drive automation, process improvements, and operational excellence.
- Coach engineers on SRE principles, including automation, toil reduction, and root cause analysis.
- Collaborate with customers in both pre-sales and post-sales engagements, supporting deep-dive discussions on product capabilities and incident resolution.
- Lead your team through organizational, process, and technology changes driven by a rapidly evolving cloud services market.
- Identify and advocate for resources (e.g., training, licenses for new tools, dedicated time for exploration) to support the team's ongoing AI literacy and adoption.
Other
- 2+ years experience managing engineering teams
- Ability to lead distributed, remote teams working across multiple time zones
- Ability to discuss complex technical issues with SREs, product managers, and less-technical stakeholders including customers and senior leaders .
- Excellent communication and collaboration skills with the ability to work effectively across global teams.
- Experience engaging directly with customers to resolve incidents and support technical discussions.
- Ability to adapt and lead teams through change in a fast-paced environment.
- Participate in a periodic 24x7 management escalation on-call rotation.