Google Cloud's Chat is evolving from a messaging platform into an intelligent, agentic collaboration partner for the enterprise, aiming to establish AI as the fundamental core of enterprise collaboration and unlock strategic value across functions for billions of users. The Chat Backend team is responsible for architecting this transformation and building foundational, next-gen GenAI capabilities.
Requirements
- 8 years of experience in software development, with a focus on Machine Learning, Distributed Systems, or Backend Engineering.
- 5 years of experience leading technical strategy and architecting large-scale ML infrastructure (e.g., designing serving layers, model evaluation frameworks, or data processing pipelines).
- Experience driving the full ML development lifecycle from hypothesis and data collection to fine-tuning, evaluation, and post-production monitoring.
- Experience building, optimizing, and deploying Generative AI systems in production (e.g., LLMs, RAG, Agentic workflows) with a focus on latency and cost.
- Experience optimizing Inference cost (TPU/GPU) and latency for real-time, user-facing applications (e.g., quantization, caching, speculative decoding).
- Experience in Retrieval Augmented Generation (RAG) architectures, including vector search, embedding optimization, and semantic retrieval strategies.
Responsibilities
- Define the technical goal and architecture for Chat's transformation into an Agentic AI-first platform (CPA, AIO), shifting to multi-turn, reasoning-based workflows.
- Architect critical migrations to the Beyond/Agency stack, designing scalable, low-latency integration with Google Workspace Agents for seamless retrieval across Gmail, Drive, and Chat.
- Own the retrieval augmented generation (RAG) roadmap, driving engineering to maximize recall at 100 and reduce hallucinations via advanced retrieval techniques.
- Architect the Quality Loop, implementing automated, data-driven evaluation frameworks to significantly cut feature development cycles.
- Own the AI stack's performance, driving optimization of inference costs and end-to-end latency to scale for billions of users.
- Act as the primary technical liaison with DeepMind, Core ML, and other Workspace pillars to influence shared infrastructure.
- Architecting Chat on Agency and building foundational, next-gen GenAI capabilities(Universal Knowledge Graph, Embeddings) like Personalized Assistant, RAG-based Search, AI Overviews, and Agentic Workflows.
Other
- Bachelor’s degree in Computer Science, Artificial Intelligence, or equivalent practical experience.
- Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
- Experience leading high-velocity teams to deliver 0 to 1 AI products in ambiguous, changing environments.
- Ability to influence technical roadmaps across organizational boundaries (e.g., partnering with Research/Core ML teams) and translating complex research into reliable product features.
- The US base salary range for this full-time position is $197,000-$291,000 + bonus + equity + benefits.