Google is looking to solve the problem of managing and optimizing AI/ML infrastructure, including GPU instances and accelerator-based VM families, to empower Google Cloud's most sophisticated training and inference customers.
Requirements
- 8 years of experience in software development.
- 5 years of experience in cloud computing emerging technologies or related technical fields.
- 3 years of experience in a technical leadership role.
- 3 years of experience with AI Platforms.
- 2 years of experience in a people management or team leadership role.
- Master's degree or PhD in Computer Science or related technical field (preferred).
- 3 years of experience working in a complex, matrixed organization (preferred).
Responsibilities
- Manage a team of top AI/ML infrastructure engineers with an extremely strong roadmap and opportunity for further growth.
- Work on the foundational AI/ML infrastructure layer in Google Cloud by building new and supporting existing accelerator-based VM families and baremetal GPU instances.
- Lead the team responsible for managing and defending the health of GPU instances as we bring them up in our data centers and deliver them to customers.
- Lead the team to ruthlessly automate and improve the management and handling of GPU machines in our data centers -- with the goal to automate away any manual tasks through tools and services and the use of AI.
- Engage and work closely with core customers as they explore and onboard our infrastructure.
- Contribute to product strategy and help develop the team.
- Manage project goals and oversee the deployment of large-scale projects across multiple sites internationally.
Other
- Bachelor’s degree or equivalent practical experience.
- People management or team leadership experience.
- Ability to work in a complex, matrixed organization.
- Strong leadership and management skills.
- Ability to work with customers and stakeholders.