Google needs software engineers to develop next-generation technologies that can handle information at massive scale and extend beyond web search, with a focus on integrated AI infrastructure systems and large-scale AI clusters.
Requirements
- 8 years of experience programming in C++.
- 5 years of experience testing, and launching software products.
- 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
- 3 years of experience with software design and architecture.
- Experience building cloud or systems level infrastructure spanning the entire hardware and software stack.
- Experience in end-to-end diagnostics, troubleshooting, and supportability, with experience leading SWAT team efforts for complex issues and developing long term sustainable solutions.
- Familiarity with Service Level Objectives (SLOs)/metrics measurement, logs/telemetry/metrics integration with tools for enhanced operator experience.
Responsibilities
- Drive project success by setting the technical goal and roadmap.
- Set priorities and projects for a team that delivers features in a fast-moving environment for both internal customers (other engineering teams) and external customers.
- Ensure central responsibility is taken for diagnostics and troubleshooting of end-to-end supportability issues, to uncover and address complex technical problems, and the building of repair automation systems.
- Implement and govern the success metrics for the team, spanning Operational Plane metrics (e.g., Support case metrics, GSO case handling), and RMA/Spares metrics (e.g., swap and repair rate).
- Building large AI clusters using the latest technologies for AI acceleration and cluster interconnects and networking.
- Developing large scale training and inference workloads, and optimizing performance.
- Exposure to integrated AI infrastructure systems (GPU or TPU), from software to hardware design and workload management.
Other
- Bachelor's degree or equivalent practical experience.
- Ability to work in a changing environment and navigate ambiguity, and a track record of delivering solutions for subtle or complex technical problems.
- Must be willing to work in Kirkland, WA, USA or Sunnyvale, CA, USA.
- Google is an equal opportunity employer and welcomes applicants with diverse backgrounds and experiences.
- Accommodations for applicants with disabilities are available upon request.