Anthropic is looking to solve the problem of creating reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society as a whole.
Requirements
- Significant software, ML, or research engineering experience
- Experience contributing to empirical AI research projects
- Familiarity with technical AI safety research
- Experience authoring research papers in machine learning, NLP, or AI safety
- Experience with LLMs
- Experience with reinforcement learning
- Experience with Kubernetes clusters and complex shared codebases
Responsibilities
- Testing the robustness of safety techniques by training language models to subvert safety techniques
- Run multi-agent reinforcement learning experiments to test out techniques like AI Debate
- Build tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks
- Write scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts
- Contribute ideas, figures, and writing to research papers, blog posts, and talks
- Run experiments that feed into key AI safety efforts at Anthropic, like the design and implementation of our Responsible Scaling Policy
Other
- At least a Bachelor's degree in a related field or equivalent experience
- Ability to be based in the Bay Area, with a preference for candidates who can travel 25% to the Bay Area
- Ability to work collaboratively in a team environment
- Strong communication skills
- Ability to pick up slack and work outside of job description