Building simulated worlds to test frontier models and advancing the science of post-training and scalable evaluation for reinforcement learning environments at a company
Requirements
- Research experience in post-training, reinforcement learning, or evaluation for LLMs
- Strong understanding of transformer models and experimental design
- Publication record at leading venues (NeurIPS, ICLR, ICML, ACL, EMNLP)
- PhD or equivalent research experience in CS, ML, NLP, or RL
Responsibilities
- Create dynamic simulations that measure real intelligence — not just accuracy
- Design new post-training algorithms (RLHF, DPO, GRPO and beyond)
- Develop richer reward models that move past exact-match scoring
- Build evaluation frameworks that define how next-generation AI is trained, aligned, and understood
- Write papers and see methods deployed in live systems
- Bridge academic insight and practical impact, helping AI progress beyond metrics that no longer tell the whole story
- Develop and implement reinforcement learning environments that push reasoning, planning, and long-horizon behaviour to their limits
Other
- PhD or equivalent research experience in CS, ML, NLP, or RL
- Hybrid/On-site in New York (preferred)
- Comprehensive benefits (401k, unlimited PTO, relocation and sponsorship available)
- Degree requirements: PhD in CS, ML, NLP, or RL
- Travel requirements: Not specified