The current paradigm for Large Language Models (LLMs) is largely "one-size-fits-all", failing to capture the diverse, implicit, and evolving preferences of individual users. The next frontier in AI is to move beyond static instruction-following and create models that dynamically learn and adapt to each user, personalizing their behavior to maximize helpfulness and satisfaction over the long term.
Requirements
- PhD in Machine Learning, Reinforcement Learning, Natural Language Processing, or a related field.
- Strong data analysis and synthetic data generation skills.
- Strong development skills in Python and experience with deep learning frameworks like JAX, PyTorch, or TensorFlow.
- Experience building and working with large-scale ML training systems.
- Deep theoretical and practical experience in Reinforcement Learning (e.g., policy gradient methods, value-based methods, model-based RL, credit assignment).
- Experience developing and training large generative models (LLMs).
- Familiarity with research on game theory, multi-agent systems, or learning from human feedback (RLHF/RLAIF).
Responsibilities
- Design and implement novel multiturn RL algorithms to train personalized LLMs. This includes exploring advanced methods for credit assignment, exploration/exploitation strategies.
- Develop and scale our training infrastructure, building on our existing framework for training against stateful user simulators.
- Formalize the problem of personalization by creating new metrics, environments, and evaluation methodologies that capture long-term user satisfaction and preference alignment.
- Collaborate closely with product teams to integrate these personalization capabilities into core Gemini products, improving tasks that require sustained interaction and user understanding.
- Do cutting-edge research that pushes the boundaries of how agents learn from interactive, human-in-the-loop data
Other
- PhD in Machine Learning, Reinforcement Learning, Natural Language Processing, or a related field.
- Strong track record of academic publications in top-tier conferences (e.g., NeurIPS, ICML, ICLR, AAAI).
- Experience building or using user simulators for RL training.
- Application Deadline: September 9, 2025
- We value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact.