Upstart is looking to solve the problem of enabling access to effortless credit based on true risk, and the Site Reliability Engineering (SRE) team is responsible for the reliability, resiliency, and observability of Upstart's production systems.
Requirements
- Minimum of 6 years combined experience between Software Engineering, Site Reliability, and/or DevOps Engineering including CI/CD, TDD, internal tooling, observability, and other agile development practices
- Proficiency coding Python, Go, JavaScript/TypeScript
- Proficiency with Infrastructure as Code (Terraform, CDK, Cloudformation, etc.)
- Software engineering background with experience building internal tooling from scratch, and other agile development techniques
- Strong software design & architecture skills
- Fundamentally sound with data structures & algorithms
- Experience with on-call and incident management environments
Responsibilities
- Embody and share SRE principles at Upstart
- Exercise state-of-the-art SRE practices throughout the company
- Uphold a culture of visibility, ownership, and responsibility around service reliability
- Implement standards for monitoring microservices, web apps, mobile apps, databases, Kubernetes clusters, and machine learning platforms, in a fast-paced environment
- Improve incident response practices, both within SRE and throughout the company
- Automate away toil that make sense to be automated
Other
- Ability to work with multiple teams for enterprise-wide deliverables
- Data/metrics-driven mindset
- Travel Requirements - This team has regular on-site collaboration sessions. These occur 3 days per quarter at an Upstart office.
- Bachelor's, Master's, or Ph.D. degree requirements not specified, but relevant experience is required
- Competitive Compensation (base + bonus & equity) and comprehensive benefits package