Intuit is looking for a software engineer to lead operational excellence and site reliability for their always-on SAAS platform for global payroll customers.
Requirements
- 5+ years related experience with expert in one of the areas in site reliability (Automation, Monitoring tools, Cloud Operations)
- Hands-on experience in at least one of the modern scripting languages
- Deep understanding of AWS services, Kubernetes and Monitoring tools.
- Proficiency in one or more of the following: Go, Java or Python.
- Understanding of SSDLC and CI/CD pipelines.
Responsibilities
- Responsible for driving operational excellence for the connected services that a business offers to its customers to deliver an "always on" operation, year-round, at the right cost
- Adopt observability best practices with distributed tracing to reduce time to detect (MTTD) and time to resolve (MTTR).
- Creating or Enhancing monitoring capabilities leveraging AI assisted tools to increase alert accuracy, detect issues and resolve automatically.
- Navigate into Products offered to customers to gain deep understanding of product knowledge and influence the engineering culture in developing observable applications.
- Creation of runbooks for standard operating procedures for every production change.
- Develop FMEA and chaos engineering best practices backed with automation.
- Investing in Self-service capabilities to drive efficiencies with focus on reducing friction and manual steps.
Other
- Passionate individual with ability to diagnose and resolve both pre-production and production issues.
- Willingness to take initiative and unblock yourself to get things
- Ability to deliver work incrementally to get feedback and iterate over solutions.
- Have a passion for working on systems that are highly reliable, maintainable, scalable, and secure.
- You are easy to work with: you communicate well, take feedback in a positive way and are OK not always doing the most glamorous tasks.