Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Research Engineer / Scientist, Safeguards

Anthropic

$315,000 - $560,000

Aug 13, 2025

San Francisco, CA, US

Anthropic is looking to solve the problem of creating reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society as a whole.

Requirements

Significant software, ML, or research engineering experience
Experience contributing to empirical AI research projects
Familiarity with technical AI safety research
Experience authoring research papers in machine learning, NLP, or AI safety
Experience with LLMs
Experience with reinforcement learning
Experience with Kubernetes clusters and complex shared codebases

Responsibilities

Testing the robustness of safety techniques by training language models to subvert safety techniques
Run multi-agent reinforcement learning experiments to test out techniques like AI Debate
Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks
Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts
Contribute ideas, figures, and writing to research papers, blog posts, and talks
Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy

Other

At least a Bachelor's degree in a related field or equivalent experience
Ability to be based in the Bay Area, with a preference for candidates who can travel 25% to the Bay Area
Ability to work collaboratively in a team environment
Strong communication skills
Ability to pick up slack and work outside of job description