Anthropic is looking for software engineers to build safety and oversight mechanisms for their AI systems, focusing on monitoring models, preventing misuse, and ensuring user well-being by detecting unwanted model behaviors and preventing disallowed use.
Requirements
- Proficiency in Python and Typescript
- Ability to work across the stack
- Have experience building trust and safety detection mechanisms and intervention for AI/ML systems
- Have experience with prompt engineering, jailbreak attacks, and other adversarial inputs
- Have worked closely with operational teams to build custom internal tooling
Responsibilities
- Develop monitoring systems to detect unwanted behaviors from our API partners and potentially take automated enforcement actions; surface these in internal dashboards to analysts for manual review
- Build abuse detection mechanisms and infrastructure
- Surface abuse patterns to our research teams to harden models at the training stage
- Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale
- Analyze user reports of inappropriate content or accounts
Other
- 5-10+ years of experience in a software engineering position, preferably with a focus on integrity, spam, fraud, or abuse detection and mitigation
- Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time.
- Visa sponsorship: We do sponsor visas!
- We encourage you to apply even if you do not believe you meet every single qualification.