Anthropic is looking to develop reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society.
Requirements
- Strong programming skills, especially in Python
- Experience with ML model training and experimentation
- Experience with ML metrics and evaluation frameworks
- Experience with language model finetuning
- Background in AI alignment research
- Familiarity with techniques like RLHF, constitutional AI, and reward modeling
Responsibilities
- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
- Train models to have better alignment properties including honesty, character, and harmlessness
- Create and maintain evaluation frameworks to measure alignment properties in models
- Collaborate across teams to integrate alignment improvements into production models
- Develop processes to help automate and scale the work of the team
Other
- Have an MS/PhD in Computer Science, ML, or related field, or equivalent experience
- Demonstrate strong analytical skills for interpreting experimental results
- Excel at turning research ideas into working code
- Can identify and resolve practical implementation challenges
- We require at least a Bachelor's degree in a related field or equivalent experience
- Currently, we expect all staff to be in one of our offices at least 25% of the time
- We do sponsor visas