Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Software Engineer, Safeguards

Anthropic

$320,000 - $425,000

Oct 15, 2025

San Francisco, CA, US

Anthropic is looking for software engineers to build safety and oversight mechanisms for their AI systems, focusing on monitoring models, preventing misuse, and ensuring user well-being by detecting unwanted model behaviors and preventing disallowed use.

Requirements

Proficiency in Python and Typescript
Ability to work across the stack
Have experience building trust and safety detection mechanisms and intervention for AI/ML systems
Have experience with prompt engineering, jailbreak attacks, and other adversarial inputs
Have worked closely with operational teams to build custom internal tooling

Responsibilities

Develop monitoring systems to detect unwanted behaviors from our API partners and potentially take automated enforcement actions; surface these in internal dashboards to analysts for manual review
Build abuse detection mechanisms and infrastructure
Surface abuse patterns to our research teams to harden models at the training stage
Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale
Analyze user reports of inappropriate content or accounts

Other

5-10+ years of experience in a software engineering position, preferably with a focus on integrity, spam, fraud, or abuse detection and mitigation
Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
Visa sponsorship: We do sponsor visas!
We encourage you to apply even if you do not believe you meet every single qualification.