At LifeStance Health, the business problem is to architect and safeguard the mission-critical infrastructure behind their national digital health platform, ensuring it scales securely and reliably to serve millions, while powering clinical care at scale with precision, performance, and resilience.
Requirements
- 10+ years in DevOps/SRE/Platform Engineering roles; at least 4+ years architecting for distributed cloud-native systems at scale.
- Expert in AWS core services (EKS, VPC, RDS, Route 53, IAM, Lambda); Terraform-first mindset.
- Proven track record in establishing SLIs/SLOs, building error budgets, and aligning them with business velocity.
- Deep expertise in Kubernetes (EKS), Helm, service meshes (Istio/Linkerd), and microservices orchestration.
- Strong software engineering fundamentals in Python, Go, or similar.
- Hands-on experience with modern observability platforms and real-time monitoring solutions.
- Technical leadership in incident response, risk management, and operational resilience in regulated industries.
Responsibilities
- Architect scalable, secure infrastructure on AWS using EKS, Lambda, and edge networking strategies.
- Define and own SLOs/SLIs for key services; integrate error budgets into product and deployment planning.
- Drive incident response operations, lead postmortems, and institutionalize RCA learnings.
- Automate everything: provisioning, security controls, deployments, chaos, DR drills—using Terraform, Helm, GitHub Actions.
- Build and maintain observability stack (Datadog, Prometheus, ELK, OpenTelemetry); deliver actionable dashboards and alerts.
- Engineer for cost-aware scale: right-size compute, optimize network paths, and containerize performance-hardened workloads.
- Implement and maintain zero-trust IAM and secrets management frameworks (Vault, AWS Secrets Manager).
Other
- Must be legally authorized to be employed in the United States.
- Demonstrates awareness, inclusivity, sensitivity, humility, and experience working with individuals from diverse ethnic backgrounds, socioeconomic statuses, sexual orientations, gender identities, and other various aspects of culture.
- Ability to translate system architecture into platform strategy and influence executive stakeholders.
- Published thought leadership or open-source contributions in reliability, observability, or infrastructure automation.
- LifeStance is an EEO/Affirmative Action Employer.