Google needs to accelerate its strategic transition to JAX, crafting the foundational infrastructure that powers the next generation of AI, and drive adoption across core business verticals (e.g., Ads, Search, YouTube)
Requirements
- 8 years of experience in software development
- 8 years of experience designing, building, and operating high-leverage software systems, with 5 years dedicated to large-scale, distributed machine learning infrastructure (training or serving)
- 5 years of experience with design and architecture; and testing/launching software products
- 5 years of experience in Machine Learning, Distributed Computing and AI Algorithms
- Experience in any one of the modern ML framework architectures (JAX, TensorFlow, PyTorch) with the knowledge of ML compiler toolchains (e.g., XLA)
- Experience with performance I/O, data pipeline optimization, or low-latency runtime systems (e.g., experience with distributed parameter server training or high-throughput serving runtimes)
- 8 years of experience with data structures/algorithms
Responsibilities
- Own the technical goal and architectural strategy for enabling Google’s production workloads to migrate to the native JAX ecosystem, driving adoption across core business verticals (e.g., Ads, Search, YouTube)
- Identify, analyze, and systematically eliminate systemic performance bottlenecks to ensure JAX delivers efficiency across Google's hardware fleet
- Define the comprehensive standards for production-grade JAX workflows, spanning data input, highly distributed training loops, efficient model checkpointing, and high-throughput serving, eliminating dependency on legacy TensorFlow components
- Act as a key technical leader interfacing with Google-wide stakeholders (XLA/Compilers, Platform Teams, and major ML consumers) to drive alignment and ensure the successful delivery of a cohesive, future-proof ML infrastructure roadmap
Other
- Bachelor’s degree or equivalent practical experience
- Master’s degree or PhD in Engineering, Computer Science, or a related technical field
- 8 years of experience leading technical project strategy, ML design, and working with ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning)
- 5 years of experience in a technical leadership role leading project teams and setting technical direction
- Ability to work in a fast-paced environment and adapt to changing priorities