Waystar is looking for a Manager of Site Reliability Engineering (SRE) to lead reliability efforts for their Clearing House team, responsible for systems that underpin secure, compliant, and high-volume transaction processing.
Requirements
- 6+ years of experience in SRE, DevOps, or infrastructure engineering, with 2+ years in a leadership role.
- Experience managing systems in regulated environments (e.g., financial services, healthcare).
- Strong background in cloud-native architectures (AWS, GCP, or Azure), containerization (Kubernetes), and infrastructure-as-code.
- Proficiency in observability tools (e.g., Grafana, Prometheus, Splunk) and CI/CD pipelines.
- Experience with Python, Powershell, and other similar languages
- Experience with high-throughput transactional systems and distributed databases.
- Familiarity with compliance frameworks (e.g., SOC 2, PCI-DSS)
Responsibilities
- Lead and mentor a team of SREs focused on the Clearing House platform.
- Ensure the availability, performance, and scalability of Clearing House services.
- Define and implement SLIs/SLOs and manage error budgets to balance innovation and reliability.
- Drive improvements in observability, incident response, and root cause analysis.
- Lead incident management and postmortem processes for production issues.
- Develop and maintain runbooks, playbooks, and automated recovery procedures.
- Monitor system health and proactively address reliability risks.
Other
- Excellent communication and stakeholder management skills.
- Competitive total rewards (base salary + bonus, if applicable)
- Customizable benefits package (3 medical plans with Health Saving Account company match)
- Paid parental leave (including maternity + paternity leave)
- Education assistance opportunities and free LinkedIn Learning access
- Bachelor's degree or equivalent experience (not explicitly mentioned but implied)