Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Anthropic Logo

Machine Learning Engineer, Safeguards

Anthropic

$315,000 - $425,000
Oct 9, 2025
San Francisco, CA, US
Apply Now

Anthropic is looking for ML engineers to build safety and oversight mechanisms for their AI systems. The goal is to train models that can detect harmful behaviors and ensure user well-being, upholding principles of safety, transparency, and oversight while enforcing terms of service and acceptable use policies.

Requirements

  • Have proficiency in Python, LLMs, SQL and data analysis/data mining tools.
  • Have proficiency in building safe AI/ML systems, such as behavioral classifiers or anomaly detection.
  • Machine learning frameworks like Scikit-Learn, TensorFlow, or PyTorch
  • High-performance, large-scale ML systems
  • Language modeling with transformers
  • Reinforcement learning
  • Large-scale ETL

Responsibilities

  • Build machine learning models to detect unwanted or anomalous behaviors from users and API partners, and integrate them into our production system
  • Improve our automated detection and enforcement systems as needed
  • Analyze user reports of inappropriate accounts and build machine learning models to detect similar instances proactively
  • Surface abuse patterns to our research teams to harden models at the training stage

Other

  • Have 4+ years of experience in a research/ML engineering or an applied research scientist position, preferably with a focus on AI safety.
  • Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders.
  • Care about the societal impacts and long-term implications of your work.
  • Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
  • Visa sponsorship: We do sponsor visas!