Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Anthropic Logo

Research Engineer / Scientist, Model Welfare

Anthropic

$315,000 - $340,000
Aug 29, 2025
San Francisco, CA, US
Apply Now

Anthropic is looking to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society, and is seeking a Research Engineer/Scientist to work on the Model Welfare program to better understand, evaluate, and address concerns about the potential welfare and moral status of AI systems.

Requirements

  • Significant applied software, ML, or research engineering experience
  • Experience contributing to empirical AI research projects and/or technical AI safety research
  • Ability to reliably turn abstract theories into creative, tractable research hypotheses and experiments
  • Familiarity with machine learning, NLP, AI safety, interpretability, and/or LLM psychology and behavior
  • Experience with moral philosophy, cognitive science, neuroscience, or related fields (not required but a plus)
  • Strong technical research engineering skills
  • Ability to move fast and iterate rather than run long extensive projects

Responsibilities

  • Investigate and improve the reliability of introspective self-reports from models
  • Collaborate with Interpretability to explore potentially welfare-relevant features and circuits
  • Improve and expand our welfare assessments for future frontier models
  • Evaluate the presence of potentially welfare-relevant capabilities and characteristics as a function of model scale
  • Develop strategies for making high-trust/verifiable commitments to models
  • Explore possible interventions and deploy them into production (e.g. allowing models to end harmful or distressing interactions)
  • Run technical research projects to investigate model characteristics of plausible relevance to welfare, consciousness, or related properties

Other

  • At least a Bachelor's degree in a related field or equivalent experience
  • Ability to work in the San Francisco office at least 25% of the time
  • Strong communication skills
  • Ability to work collaboratively with other teams
  • Strong project management skills (a plus)