Anthropic is looking to advance the frontier of safe tool use in AI systems, specifically in the area of tool use safety, to enable the company to scale responsibly and protect millions of users.
Requirements
- Strong machine learning research/applied-research experience, or a strong quantitative background such as physics, mathematics, or quantitative finance research
- Experience with tool use/agentic safety, trust & safety, or security
- Experience with reinforcement learning techniques and environments
- Experience with language model training, fine-tuning or evaluation
- Experience building AI agents or autonomous systems
- Published influential work in relevant ML areas, especially around LLM safety & alignment
- Deep expertise in a specialized area (e.g., RL, security, or mathematical foundations), even if still developing breadth in adjacent areas
Responsibilities
- Design and implement novel and scalable reinforcement learning methodologies that push the state of the art of tool use safety
- Define and pursue research agendas that push the boundaries of what's possible
- Build rigorous, realistic evaluations that capture the complexity of real-world tool use safety challenges
- Ship research advances that directly impact and protect millions of users
- Collaborate with other safety research (e.g. Safeguards, Alignment Science), capabilities research, and product teams to drive fundamental breakthroughs in safety, and work with teams to ship these into production
- Design, implement, and debug code across our research and production ML stacks
- Contribute to our collaborative research culture through pair programming, technical discussions, and team problem-solving
Other
- At least a Bachelor's degree in a related field or equivalent experience
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time
- Visa sponsorship: We do sponsor visas, but we aren't able to successfully sponsor visas for every role and every candidate
- Strong communication skills to communicate complex ideas clearly to diverse audiences
- Ability to work collaboratively in a team environment