Google's Site Reliability Engineering (SRE) team needs to build and run large-scale, massively distributed, fault-tolerant systems to ensure the reliability, uptime, and performance of Google's services. This involves optimizing existing systems, building infrastructure, and eliminating work through automation to manage the complex challenges of scale unique to Google.
Requirements
- 2 years of experience with software development in one or more programming languages.
- 2 years of experience designing, analyzing, and troubleshooting distributed systems.
Responsibilities
- design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- measuring and monitoring performance.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for and implementing changes that improve reliability and velocity.
- Participate regularly in on-call rotation, including incident coordination, distributed system debugging, implementing technical mitigations and long term fixes, as well as blameless postmortem authoring.
- design, develop, test, deploy, maintain, and enhance software solutions.
- manage project priorities, deadlines, and deliverables.
- building the next generation of Google platforms
Other
- Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California Fair Chance Act.
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- Master's degree in Computer Science or Engineering.
- The US base salary range for this full-time position is $141,000-$202,000 + bonus + equity + benefits.
- Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.