Google Cloud's services require reliability, uptime and fast rate of improvement, and the Site Reliability Engineering (SRE) team is looking to solve this problem by building and running large-scale, massively distributed, fault-tolerant systems.
Requirements
- 2 years of experience with software development in one or more programming languages.
- Master's degree in Computer Science or Engineering.
- 2 years of experience designing, analyzing, and troubleshooting large-scale distributed systems.
Responsibilities
- Write product or system development code.
- Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
- Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
- Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality.
- Participate in the team's on-call rotation to support production services.
Other
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- 2 years of experience with software development in one or more programming languages.
- Must be willing to participate in the team's on-call rotation to support production services.
- Must be willing to work in a blame-free environment and collaborate with others.
- Must have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.