Freenome is transitioning from research to commercial operations and needs to establish and maintain reliable, regulated, cloud-based production environments. The new Site Reliability Engineering (SRE) team will define the culture and build the systems to ensure this reliability, directly contributing to the company's mission of saving lives.
Requirements
- 5+ years in software engineering or Infra/DevOps/SRE roles (Python or Go are what we currently use)
- Experience deploying cloud infrastructure via automation (e.g. Terraform, Pulumi, Bicep/ARM, etc.)
- Incident management experience in cloud/software engineering as well as familiarity with incident management platforms (e.g., Incident.io, ServiceNow, Opsgenie, Pagerduty, etc.)
- Hands-on experience operating production workloads in cloud environments
- Familiarity with Kubernetes (AKS, GKE, or EKS)
- Strong troubleshooting and root-cause analysis skills in distributed systems
- Experience with observability platforms (e.g., DataDog, Prometheus/Grafana, OpenTelemetry)
Responsibilities
- Define and implement observability practices (metrics, traces, dashboards, logs, alerts) for production systems
- Partner with engineering teams to define SLIs/SLOs and establish error budgets
- Contribute to production deployment and change-management processes that meet FDA and compliance requirements
- Automate operational tasks, reducing manual intervention
- Contribute to production systems and designs with the goal of improving reliability
- Use Infrastructure as Code (IaC) to manage and deploy team owned infrastructure and subsystems
- Help build out the SRE practice
Other
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience
- Work closely with engineering, product, and lab teams to understand service reliability needs
- Partner with TPMs, RA/QA, and compliance stakeholders to align operational practices with regulatory requirements
- Model Freenome’s values and principles in your work and interactions
- Promote a collaborative, reliable engineering culture across product, infra, and lab engineering teams