Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Software Engineer, TPU Performance, CoreML

Google

$197,000 - $291,000

Nov 13, 2025

Sunnyvale, CA, US

Google's software engineers develop next-generation technologies that change how billions of users connect, explore, and interact with information. Products need to handle information at massive scale, and extend well beyond web search. Google Cloud accelerates every organization’s ability to digitally transform its business and industry. The Staff Software Engineer will develop ML performance analysis and optimization technology to advance the latest TPU platform to market leading performance.

Requirements

5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
5 years of experience with one or more of the following: speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), or specialization in another ML field.
5 years of experience with ML design and ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
5 years of experience working with GPU or TPU optimizations.
8 years of experience with data structures/algorithms.
3 years of experience with compiler optimization, code generation, and runtime systems for GPU architectures (OpenXLA, MLIR, Triton, etc.).
3 years of experience in tailoring algorithms and ML models to exploit ML accelerator architecture strengths and minimize weaknesses.

Responsibilities

Identify and maintain ML training benchmarks that are representative to Google production, industry and ML community, use them to identify performance opportunities and drive TensorFlow (TF)/JAX GPU performance and to gate TF/JAX releases.
Engage with Google product team, Cloud, researchers to solve their performance problems.
Analyze performance and efficiency metrics to identify bottlenecks, design and implement solutions at Google fleetwide scale.
Explore model/data efficiency techniques, for example, new ML model arch/optimizer/training technique to solve a ML task more efficiently, new techniques to reduce the label/unlabeled ML data needed to train a model to target accuracy.
Work with tooling and fleet metrics subteams to build tools to track performance and efficiency and to extract metrics from Google running workloads.
Develop ML performance analysis and optimization technology to advance the latest TPU platform to market leading performance.
Work on Gemini, as well as industry leading open-source models, to understand model architecture and optimize the performance of these ML models on TPU systems for both JAX and PyTorch platforms.

Other

8 years of experience in software development.
3 years of experience in a technical leadership role leading project teams and setting technical direction.
3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
Versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack.