Anthropic is seeking to protect and enhance its AI services by developing AI-driven detection models for identifying misuse and implementing practical safety measurements.
Requirements
- 5+ years of experience in trust & safety or anti-fraud/risk engineering, with a focus on applied Machine Learning
- Deep experience with techniques for detecting harmful content and platform misuse
- Experience working with or managing teams focused on applied machine learning
- Knowledge of common internet threats and evolving adversarial techniques
- Experience implementing AI-driven safety measures in production environments
Responsibilities
- Set team vision and roadmap to detect and prevent harmful usage of Anthropic's AI services through applied machine learning solutions
- Lead a team of ML and software engineers to translate complex AI capabilities into practical safety mechanisms
- Partner with T&S Product, Policy, and Enforcement teams to identify risk vectors and implement ML-driven detection and enforcement actions
- Maintain a deep understanding of both AI safety research and trust & safety best practices
- Drive major collaborations between research and policy teams across Anthropic
- Hire, support, and develop team members through continuous feedback, career coaching, and people management practices
Other
- 5+ years of management experience in a technical ML-focused environment
- Demonstrated ability to lead and manage high-performing technical teams
- Excellent communication skills in translating complex technical concepts for various audiences
- Strong project management skills with the ability to balance multiple priorities
- Bachelor's degree in a related field or equivalent experience
- Ability to be in one of our offices at least 25% of the time