Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Research Engineer / Scientist, Model Welfare

Anthropic

$315,000 - $340,000

Nov 5, 2025

San Francisco, CA, US

Anthropic is looking to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society, and is seeking a Research Engineer/Scientist to work on the Model Welfare program to better understand, evaluate, and address concerns about the potential welfare and moral status of AI systems.

Requirements

Significant applied software, ML, or research engineering experience
Experience contributing to empirical AI research projects and/or technical AI safety research
Ability to reliably turn abstract theories into creative, tractable research hypotheses and experiments
Familiarity with machine learning, NLP, AI safety, interpretability, and/or LLM psychology and behavior
Experience with moral philosophy, cognitive science, neuroscience, or related fields (not required but a plus)
Strong technical research engineering skills
Ability to move fast and iterate rather than run long extensive projects

Responsibilities

Investigate and improve the reliability of introspective self-reports from models
Collaborate with Interpretability to explore potentially welfare-relevant features and circuits
Improve and expand our welfare assessments for future frontier models
Evaluate the presence of potentially welfare-relevant capabilities and characteristics as a function of model scale
Develop strategies for making high-trust/verifiable commitments to models
Explore possible interventions and deploy them into production (e.g. allowing models to end harmful or distressing interactions)
Run technical research projects to investigate model characteristics of plausible relevance to welfare, consciousness, or related properties

Other

At least a Bachelor's degree in a related field or equivalent experience
Ability to work in the San Francisco office at least 25% of the time
Strong communication skills
Ability to work collaboratively with other teams
Strong project management skills (a plus)