Mueller's Smart Water Infrastructure team is looking for a Senior Site Reliability Engineer to ensure the availability, reliability, scalability, and performance of their software products.
Requirements
- Operational experience of AWS Serverless technologies
- Linux and Windows system administration
- CI/CD pipelines
- Database Administration
- Patch Management and Disaster and Recovery
- Advanced Monitoring knowledge.
- Automation scripting in a mainstream programming language
Responsibilities
- Deployment, monitoring and ensuring the availability, reliability, scalability, and performance of software products against operational targets.
- Design, implementation, and maintenance of infrastructure required to support software products.
- Collaborate with software development teams to ensure that services are designed with availability, security, scalability, reliability, and performance in mind from the outset.
- Monitor and manage live production environments, identifying and resolving issues as they arise and implementing long-term solutions to prevent their recurrence.
- Develop and maintain automation tools for system health, performance monitoring, and incident response to ensure rapid detection and resolution of issues.
- Resolve support issues where your experience is required to ascertain the issue quickly and to find an appropriate resolution.
- Lead root cause analysis of critical outages, contributing to a culture of learning and continuous improvement.
Other
- Be available ‘out of hours’ if required to complete specific tasks and support customers in emergency or disaster scenarios.
- Mentor junior engineers, fostering a culture of technical excellence and collaborative problem-solving.
- Strong collaboration skills to work effectively with cross-functional teams.
- Excellent communication skills, both verbal and written, to effectively articulate technical and product information.
- Ability to prioritize and manage multiple tasks simultaneously and work under tight deadlines.