Deliver an always-on SAAS platform to global payroll customers at Intuit
Requirements
- 5+ years related experience with expert in one of the areas in site reliability (Automation, Monitoring tools, Cloud Operations)
- Hands-on experience in at least one of the modern scripting languages
- Deep understanding of AWS services, Kubernetes and Monitoring tools
- Proficiency in one or more of the following: Go, Java or Python
- Understanding of SSDLC and CI/CD pipelines
Responsibilities
- Responsible for driving operational excellence for the connected services that a business offers to its customers to deliver an 'always on' operation, year-round, at the right cost
- Adopt observability best practices with distributed tracing to reduce time to detect (MTTD) and time to resolve (MTTR)
- Creating or Enhancing monitoring capabilities leveraging AI assisted tools to increase alert accuracy, detect issues and resolve automatically
- Navigate into Products offered to customers to gain deep understanding of product knowledge and influence the engineering culture in developing observable applications
- Creation of runbooks for standard operating procedures for every production change
- Develop FMEA and chaos engineering best practices backed with automation
- Investing in Self-service capabilities to drive efficiencies with focus on reducing friction and manual steps
Other
- Passionate individual with ability to diagnose and resolve both pre-production and production issues
- Ability to deliver work incrementally to get feedback and iterate over solutions
- You are easy to work with: you communicate well, take feedback in a positive way and are OK not always doing the most glamorous tasks
- Part of On-call rotation to respond to incoming alerts, triage and take necessary steps to minimize the impact
- Willingness to take initiative and unblock yourself to get things