Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineering Manager, Safeguards

Anthropic

$340,000 - $425,000

Aug 12, 2025

San Francisco, CA, US

Anthropic is seeking to protect and enhance its AI services by developing AI-driven detection models for identifying misuse and implementing practical safety measurements.

Requirements

5+ years of experience in trust & safety or anti-fraud/risk engineering, with a focus on applied Machine Learning
Deep experience with techniques for detecting harmful content and platform misuse
Experience working with or managing teams focused on applied machine learning
Knowledge of common internet threats and evolving adversarial techniques
Experience implementing AI-driven safety measures in production environments

Responsibilities

Set team vision and roadmap to detect and prevent harmful usage of Anthropic's AI services through applied machine learning solutions
Lead a team of ML and software engineers to translate complex AI capabilities into practical safety mechanisms
Partner with T&S Product, Policy, and Enforcement teams to identify risk vectors and implement ML-driven detection and enforcement actions
Maintain a deep understanding of both AI safety research and trust & safety best practices
Drive major collaborations between research and policy teams across Anthropic
Hire, support, and develop team members through continuous feedback, career coaching, and people management practices

Other

5+ years of management experience in a technical ML-focused environment
Demonstrated ability to lead and manage high-performing technical teams
Excellent communication skills in translating complex technical concepts for various audiences
Strong project management skills with the ability to balance multiple priorities
Bachelor's degree in a related field or equivalent experience
Ability to be in one of our offices at least 25% of the time