Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Anthropic Logo

Research Engineer / Scientist, Safeguards

Anthropic

$315,000 - $560,000
Aug 13, 2025
San Francisco, CA, US
Apply Now

Anthropic is looking to solve the problem of creating reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society as a whole.

Requirements

  • Significant software, ML, or research engineering experience
  • Experience contributing to empirical AI research projects
  • Familiarity with technical AI safety research
  • Experience authoring research papers in machine learning, NLP, or AI safety
  • Experience with LLMs
  • Experience with reinforcement learning
  • Experience with Kubernetes clusters and complex shared codebases

Responsibilities

  • Testing the robustness of safety techniques by training language models to subvert safety techniques
  • Run multi-agent reinforcement learning experiments to test out techniques like AI Debate
  • Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks
  • Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts
  • Contribute ideas, figures, and writing to research papers, blog posts, and talks
  • Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy

Other

  • At least a Bachelor's degree in a related field or equivalent experience
  • Ability to be based in the Bay Area, with a preference for candidates who can travel 25% to the Bay Area
  • Ability to work collaboratively in a team environment
  • Strong communication skills
  • Ability to pick up slack and work outside of job description