Anthropic is looking to develop and implement techniques for training language models that are more aligned with human values, demonstrating better moral reasoning, improved honesty, and good character.
Requirements
- Possess strong programming skills, especially in Python
- Have experience with ML model training and experimentation
- Have a track record of implementing ML research
- Demonstrate strong analytical skills for interpreting experimental results
- Have experience with ML metrics and evaluation frameworks
- Excel at turning research ideas into working code
- Can identify and resolve practical implementation challenges
Responsibilities
- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines
- Use these to train models to have better alignment properties including honesty, character, and harmlessness
- Create and maintain evaluation frameworks to measure alignment properties in models
- Collaborate across teams to integrate alignment improvements into production models
- Develop processes to help automate and scale the work of the team
Other
- We require at least a Bachelor's degree in a related field or equivalent experience.
- Currently, we expect all staff to be in one of our offices at least 25% of the time.
- We do sponsor visas!
- We greatly value communication skills.