Anthropic is looking to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society.
Requirements
- Significant software, ML, or research engineering experience
- Experience contributing to empirical AI research projects
- Familiarity with technical AI safety research
- Experience authoring research papers in machine learning, NLP, or AI safety
- Experience with LLMs
- Experience with reinforcement learning
- Experience with Kubernetes clusters and complex shared codebases
Responsibilities
- Testing the robustness of safety techniques by training language models to subvert safety techniques
- Running multi-agent reinforcement learning experiments to test techniques like AI Debate
- Building tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks
- Writing scripts and prompts to efficiently produce evaluation questions to test models’ reasoning abilities in safety-relevant contexts
- Contributing ideas, figures, and writing to research papers, blog posts, and talks
- Running experiments that feed into key AI safety efforts at Anthropic
Other
- Bachelor's degree in a related field or equivalent experience
- Ability to be based in the Bay Area (or travel 25% to the Bay Area)
- Ability to pick up slack and contribute to collaborative projects
- Care about the impacts of AI
- Strong communication skills