Waystar is seeking a Senior Specialist, Site Reliability Engineering (SRE) to drive reliability, scalability, and performance across critical platforms
Requirements
- Deep expertise in cloud platforms (AWS, GCP, or Azure)
- Container orchestration (Kubernetes) and infrastructure-as-code (Terraform, CloudFormation)
- Strong proficiency in observability tools (e.g., Prometheus, Grafana, Splunk) and CI/CD pipelines
- Experience in Python, Powershell, or other similar languages
- Familiarity with chaos engineering, performance tuning, and capacity planning
- Background in software development with strong coding skills (e.g., Python, Go, Bash)
Responsibilities
- Architect and implement solutions to improve system reliability, scalability, and performance
- Define and manage SLIs/SLOs and error budgets across services
- Lead efforts to automate operational tasks and improve system observability
- Serve as a technical lead during major incidents and drive resolution
- Conduct deep root cause analyses and implement long-term fixes
- Champion blameless postmortems and continuous improvement
- Enhance observability through metrics, logging, and tracing
Other
- 7+ years of experience in SRE, DevOps, or infrastructure engineering
- Excellent communication and collaboration skills
- Competitive total rewards (base salary + bonus, if applicable)
- Customizable benefits package (3 medical plans with Health Saving Account company match)
- Paid parental leave (including maternity + paternity leave)