Google's Site Reliability Engineering (SRE) team needs to ensure the reliability, uptime, and performance of massively distributed, fault-tolerant systems and services, optimizing existing systems, building infrastructure, and eliminating work through automation to manage the complex challenges of scale unique to Google.
Requirements
- 2 years of experience with software development in one or more programming languages.
- 2 years of experience designing, analyzing, and troubleshooting large-scale distributed systems.
Responsibilities
- Write product or system development code.
- Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
- Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
- Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality.
- Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
- manage project priorities, deadlines, and deliverables.
- design, develop, test, deploy, maintain, and enhance software solutions.
Other
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- Master's degree in Computer Science or Engineering.
- San Francisco, CA, USA; Sunnyvale, CA, USA