Yahoo Mail is the ultimate consumer inbox with hundreds of millions of users. The Mail Service Reliability Engineering (SRE) Manager is responsible for ensuring 7x24 incident management and the reliability of mail services.
Requirements
- Minimum 7 years of proven experience in Incident Management, preferably in a large-scale, distributed mail or messaging system environment, for both on-perm and cloud environments.
- Hands-on experience with monitoring tools, dashboard setup, and alerting systems.
- Deep understanding of SRE principles: system reliability, operational runbooks, and root cause analysis.
- Demonstrable record of improving service reliability metrics (MTTD, MTTR, Availability).
Responsibilities
- Lead, organize, and oversee the team’s 7x24 incident response for all mail applications, ensuring rapid detection and resolution of incidents.
- Implement comprehensive system/service health monitoring.
- Design, deploy, and maintain dashboards for real-time visibility of critical metrics (Availability, MTTD, MTTR).
- Set up alerts and escalation processes for early issue detection and response.
- Develop and maintain detailed runbooks for SRE and Operations teams, specifying permissions, documented service impact, and clear step-by-step procedures for incident response and service changes.
- Facilitate root cause analysis and post-mortems for all major incidents, ensuring action items are tracked and implemented for continuous improvement.
- Oversee safe deployment procedures; ensure readiness for rollback operations during outage.
Other
- The manager leads a diverse, distributed team across multiple time zones and countries, partnering closely to respond to and resolve mail service incidents and implement changes in production environments.
- Strong organizational, leadership, and communication skills across diverse, global teams.
- Coordinate with team members and partners across different regions and time zones to ensure seamless handoffs and communication.
- Foster a culture of reliability, accountability, and proactive problem-solving.
- The compensation for this position ranges from $136,125.00 - $283,750.00/yr