Google Cloud is looking to solve the problem of ensuring reliability, uptime, and fast rate of improvement for its services, including Google Distributed Cloud (GDC) and Google Distributed Cloud air-gapped (GDCag), by hiring a Site Reliability Engineer (SRE) to lead the enhancement of reliability for these products.
Requirements
- 8 years of experience with software development in one or more programming languages.
- 4 years of experience leading projects, and providing technical leadership.
- 3 years of experience in designing, analyzing, and troubleshooting distributed systems.
- Experience with Cloud compute platforms (Kubernetes, Cloud Functions).
- Experience with Non-Abstract Large Systems Design.
- Master's degree in Computer Science or Engineering (preferred).
Responsibilities
- Take on ambiguous problems and drive solutions across the SRE and DEV organizations.
- Identify reliability, scalability and efficiency gaps, propose programs to address them, get buy-in and drive them to success.
- Cultivate and maintain a culture of reliability throughout the GDC air-gapped organization.
- Guide technical decisions, balancing the need for a reliable system and efficient incident response with highly dynamic, customer priorities.
- Ensure the long-term health, maintainability, and reliability of services.
Other
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- Must be willing to work in a blame-free environment and collaborate with people from diverse backgrounds and experiences.
- Must be able to work on meaningful projects and take risks.
- Must be able to self-direct and work with minimal supervision.
- Must be willing to learn and grow in a dynamic environment.