Advancing reinforcement learning methods for large-scale AI systems to enhance reasoning, planning, and decision-making in models that directly impact fields from biology to climate and materials science at a company pushing the boundaries of what reinforcement learning can achieve with frontier models
Requirements
- Deep expertise in reinforcement learning (policy optimisation, value-based, or model-based methods)
- Experience applying RL to large models (RLHF, PPO, DPO)
- Hands-on experience with model training and fine-tuning at scale
- Experience with distributed computing platforms (cloud or HPC clusters)
- Experience with multi-agent RL, hierarchical/offline RL, or domain-specific work with scientific datasets
- Contributions to top-tier conferences (NeurIPS, ICML, ICLR, AAAI)
- Experience with large language models
Responsibilities
- Applying RL techniques to enhance reasoning, planning, and decision-making in models
- Combining RL with large language models, experimenting with RLHF, PPO, and DPO
- Designing evaluation frameworks
- Fine-tuning models at scale
- Collaborating with domain experts to ensure research translates into real-world scientific progress
- Building towards a broader superintelligence platform: models that don’t just generate text or data, but drive breakthroughs across multiple domains
- Running rigorous experiments and improving models based on results
Other
- PhD in Computer Science, Machine Learning, Robotics, or related field
- Location: SF Bay area or potential for remote with travel to office when needed
- Package: $250k - $400k base + bonus + stock
- Travel to office when needed
- Domain expertise to ensure research translates into real-world scientific progress