Google is looking to solve the problem of ensuring reliability, uptime, and fast rate of improvement for its services, both internally critical and externally-visible systems, by hiring a Site Reliability Engineer (SRE) to build and run large-scale, massively distributed, fault-tolerant systems.
Requirements
- 5 years of experience with software development in one or more programming languages.
- 3 years of experience in designing, analyzing, and troubleshooting distributed systems.
- 2 years of experience leading projects and providing technical leadership.
- 5 years of experience with data structures and algorithms.
- Experience with coding, algorithms, complexity analysis and large-scale system design.
Responsibilities
- Design, plan, and execute on software engineering projects that help products operate efficiently and reliably inside of Google's data center.
- Participate in an on-call rotation, responding to incidents, and ultimately root causing and implementing automated and self-healing solutions to prevent incidents in the future.
- Debug live production systems.
Other
- Bachelor's degree in Computer Science, a related technical field or equivalent practical experience.
- Master's degree in Computer Science, Engineering, or a related field.
- 2 years of experience leading projects and providing technical leadership.
- Ability to work in a blame-free environment and collaborate with people with a wide variety of backgrounds, experiences and perspectives.
- Must be willing to work in Durham, NC, USA or Raleigh, NC, USA