Anthropic is looking to solve the challenge of systematically understanding and monitoring model quality in real-time to create reliable, interpretable, and steerable AI systems.
Requirements
- Are proficient in Python and have experience building production ML systems
- Have experience with training, evaluating, or monitoring large language models
- Experience with reinforcement learning and language model training pipelines
- Experience designing and implementing evaluation frameworks or benchmarks
- Background in production monitoring, observability, and incident response
- Experience with statistical analysis and experimental design
- Knowledge of AI safety and alignment research
Responsibilities
- Build comprehensive training observability systems - Design and implement monitoring infrastructure to keep an eye on how model behaviors evolve throughout training.
- Develop next-generation evaluation frameworks - Move beyond traditional benchmarks to create evaluations that capture real-world utility.
- Create automated quality assessment pipelines - Build custom classifiers to continuously monitor RL transcripts for complex issues
- Bridge research and production - Partner with research teams to translate cutting-edge evaluation techniques into production-ready systems, and work with engineering teams to ensure our monitoring infrastructure scales with increasingly complex training workflows.
Other
- At least a Bachelor's degree in a related field or equivalent experience
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
- Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate.
- Strong analytical skills for interpreting training metrics and model behavior
- Enjoy collaborative problem-solving and working across diverse teams