Chainguard is seeking a Site Reliability Engineer to design, automate, and scale secure-by-default cloud infrastructure to ensure high uptime and minimize on-call incidents.
Requirements
- Comfortable working and thriving within a Linux ecosystem
- Experience supporting high availability distributed production systems
- Experience with database administration and support
- Treated infrastructure as code utilizing tools like Terraform, Ansible, Chef, Puppet, and SaltStack
- Familiarity working in a public cloud platform (GCP, AWS, Azure)
- Software development skills in at least one of the following languages: Python, Go, Javascript, and/or Ruby
- Knowledge of microservices architecture and containerization (Docker/OCI, Kubernetes)
Responsibilities
- Practice continuous improvement, by iterating on how services are deployed, configured, monitored, and maintained on our platform
- Lead incident response, diagnosis, and follow-up on system outages and alerts
- Help develop an operational focus and act as thought leaders for the rest of engineering
- Maintain and optimize infrastructure for performance, scalability, and cost.
- Analyze system metrics and identify opportunities for improvement in reliability and efficiency.
Other
- B.S. or M.S. in Computer Science or related field or equivalent in related work experience.
- Strong English language skills and ability to work independently, as an effective part of a globally distributed team
- Ability to learn about the supply chain security space
- Flexible & Remote-First Culture
- ∞ Flexible Time Off