Google DeepMind is focused on developing advanced conversational capabilities powered by the Gemini LLM to empower multimodal conversational agents with cutting-edge speech and audio functionalities, aiming to build intelligent agents capable of orchestrating complex, real-time, multi-speaker, and multimodal conversations.
Requirements
- Experience working with LLMs.
- Demonstrated experience in data preparation, training, and evaluation of ML models.
- A strong record of publications in top-tier machine-learning related conferences
- Experience in dialog and agentic systems.
- Experience with multimodal models and processing (e.g., text / video / audio).
- Research background in NLP / Generative AI
Responsibilities
- Partner with the Gemini/GDM teams to design, develop, and deploy novel multimodal conversational agents.
- Develop audio-first models capable of orchestrating and planning complex dialogs, including leveraging external tools like search when necessary.
- Leverage new sources of data (real and synthetic) to empower new real-time dialog capabilities.
- Work with infra teams to design models suitable for streaming bi-directional dialog, so the user experience is always fluid and low-latency.
- Rapidly prototype and evaluate new technologies.
Other
- PhD in Computer Science, or Machine Learning related field.