Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineer, Safeguards

Anthropic

$315,000 - $425,000

Oct 9, 2025

San Francisco, CA, US

Anthropic is looking for ML engineers to build safety and oversight mechanisms for their AI systems. The goal is to train models that can detect harmful behaviors and ensure user well-being, upholding principles of safety, transparency, and oversight while enforcing terms of service and acceptable use policies.

Requirements

Have proficiency in Python, LLMs, SQL and data analysis/data mining tools.
Have proficiency in building safe AI/ML systems, such as behavioral classifiers or anomaly detection.
Machine learning frameworks like Scikit-Learn, TensorFlow, or PyTorch
High-performance, large-scale ML systems
Language modeling with transformers
Reinforcement learning
Large-scale ETL

Responsibilities

Build machine learning models to detect unwanted or anomalous behaviors from users and API partners, and integrate them into our production system
Improve our automated detection and enforcement systems as needed
Analyze user reports of inappropriate accounts and build machine learning models to detect similar instances proactively
Surface abuse patterns to our research teams to harden models at the training stage

Other

Have 4+ years of experience in a research/ML engineering or an applied research scientist position, preferably with a focus on AI safety.
Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders.
Care about the societal impacts and long-term implications of your work.
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
Visa sponsorship: We do sponsor visas!