Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Anthropic Logo

Software Engineer, Safeguards

Anthropic

$320,000 - $425,000
Oct 15, 2025
San Francisco, CA, US
Apply Now

Anthropic is looking for software engineers to build safety and oversight mechanisms for their AI systems, focusing on monitoring models, preventing misuse, and ensuring user well-being by detecting unwanted model behaviors and preventing disallowed use.

Requirements

  • Proficiency in Python and Typescript
  • Ability to work across the stack
  • Have experience building trust and safety detection mechanisms and intervention for AI/ML systems
  • Have experience with prompt engineering, jailbreak attacks, and other adversarial inputs
  • Have worked closely with operational teams to build custom internal tooling

Responsibilities

  • Develop monitoring systems to detect unwanted behaviors from our API partners and potentially take automated enforcement actions; surface these in internal dashboards to analysts for manual review
  • Build abuse detection mechanisms and infrastructure
  • Surface abuse patterns to our research teams to harden models at the training stage
  • Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale
  • Analyze user reports of inappropriate content or accounts

Other

  • 5-10+ years of experience in a software engineering position, preferably with a focus on integrity, spam, fraud, or abuse detection and mitigation
  • Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
  • Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
  • Visa sponsorship: We do sponsor visas!
  • We encourage you to apply even if you do not believe you meet every single qualification.