Anthropic is looking to advance the frontier of safe tool use in their AI model, Claude. This involves addressing challenges like prompt injection robustness, data exfiltration through tool misuse, adversarial attacks in multi-turn conversations, and ensuring safety with autonomous agents operating with numerous tools over long horizons. The goal is to scale AI responsibly and make it more reliable, interpretable, and steerable.
Requirements
- Experience with tool use/agentic safety, trust & safety, or security
- Experience with reinforcement learning techniques and environments
- Experience with language model training, fine-tuning or evaluation
- Experience building AI agents or autonomous systems
- Published influential work in relevant ML areas, especially around LLM safety & alignment
- Deep expertise in a specialized area (e.g., RL, security, or mathematical foundations), even if still developing breadth in adjacent areas
- Experience shipping features or working closely with product teams
Responsibilities
- Design and implement novel and scalable reinforcement learning methodologies that push the state of the art of tool use safety
- Define and pursue research agendas that push the boundaries of what's possible
- Build rigorous, realistic evaluations that capture the complexity of real-world tool use safety challenges
- Ship research advances that directly impact and protect millions of users
- Collaborate with other safety research (e.g. Safeguards, Alignment Science), capabilities research, and product teams to drive fundamental breakthroughs in safety, and work with teams to ship these into production
- Design, implement, and debug code across our research and production ML stacks
- Contribute to our collaborative research culture through pair programming, technical discussions, and team problem-solving
Other
- Passionate about our safety mission
- Are driven by real-world impact and excited to see research ship in production
- Have strong machine learning research/applied-research experience, or a strong quantitative background such as physics, mathematics, or quantitative finance research
- Write clean, reliable code and have solid software engineering skills
- Communicate complex ideas clearly to diverse audiences
- Are hungry to learn and grow, regardless of years of experience
- Enthusiasm for pair programming and collaborative research
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
- We do sponsor visas!