Google is looking to solve the increasing complexity of ML model and ML accelerator architectures by enabling cost-effective performance and power of future ML systems through comprehensive analysis and HW/SW Co-Design. This includes fast iteration and innovation for ML system co-design and improvement, automated HW-friendly model improvement/enablement at scale, understanding of business-critical production ML models, and full stack ML hardware/software co-design with significantly improved engineering velocity and results.
Requirements
- 8 years of experience with software development.
- 7 years of experience leading technical project strategy, ML design, and optimizing industry-scale ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
- 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field.
- Experience focused on ML performance modeling and Improvements.
- Experience on Large-Language Models (LLMs), ML framework and compiler.
- Knowledge of performance analysis and experience in performance modeling of High-Performance Computing (HPC) interconnect topologies.
- Knowledge of computer architecture (Tensor Processing Unit (TPU) or other accelerators).
Responsibilities
- Explore and define future Machine Learning (ML) accelerator system and chip architecture with objective and data-driven ground truth.
- Enable the cost effective peak performance of future ML systems with full stack ML Hardware/Software (HW/SW) co-design.
- Establish understanding of the latest business-critical production ML models (Large-language models, large embedding models etc.) to inform improvements of model architecture, software system and hardware architecture.
- Develop Simulator technologies to continuously keep up with evolving new system architecture choices and new ML workloads as well as supporting simulations at different abstraction levels.
- Optimize your own code and make sure Engineers are able to optimize theirs.
- Manage your project goals, contribute to product strategy and help develop your team.
- Oversee the deployment of large-scale projects across multiple sites internationally.
Other
- 5 years of experience in a technical leadership role; overseeing projects.
- 5 years of experience in a people management, supervision/team leadership role.
- Manage a team of 7 (eventually growing to 15) people.
- Manage engineers across multiple teams and locations, a large product budget and oversee the deployment of large-scale projects across multiple sites internationally.
- Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status.