Affirm is looking to solve the problem of making credit more honest and friendly by giving consumers flexibility to buy now and pay later without hidden fees or compounding interest, and the Site Reliability Engineering team helps Engineering partners to operate their applications with excellence to protect customers' experience
Requirements
- 7+ years of experience designing, developing and launching backend systems at scale using languages like Python or Kotlin
- Extensive track record of developing highly available distributed systems using technologies like AWS, MySQL, Spark and Kubernetes
- 7+ years experience in a Site Reliability or Production Engineering team
- Experience delivering major features, system components or deprecating existing functionality in a system through the definition of a technical and execution plan
- Experience with infrastructure, platform, and distributed systems
- Experience with capacity management, load and chaos testing
- Experience with automation, observability, and configuration management
Responsibilities
- Providing data and visibility to teams and leadership on application performance
- Guiding the development of SLOs
- Driving the Incident Management and Analysis process
- Steering the implementation of Change Management and Deployment practices
- Engaging in service and architectural conversations
- Recommending observability and alerting configurations
- Setting technical strategy for your team on a year-long time scale
Other
- Bachelor's degree in a related field
- Strong verbal and written communication skills
- Ability to work remotely in the US
- Ability to travel to assigned Affirm office occasionally
- Equivalent practical experience