Anthropic is looking to build reliable, interpretable, and steerable AI systems, and needs an ML Systems Engineer to improve the performance, robustness, and usability of the systems that train AI models.
Requirements
- High performance, large scale distributed systems
- Large scale LLM training
- Python
- Implementing LLM finetuning algorithms, such as RLHF
Responsibilities
- Profiling the reinforcement learning pipeline to find opportunities for improvement
- Building a system that regularly launches training jobs in a test environment to quickly detect problems in the training pipeline
- Making changes to the finetuning systems so they work on new model architectures
- Building instrumentation to detect and eliminate Python GIL contention in the training code
- Diagnosing why training runs have started slowing down after some number of steps, and fixing it
- Implementing a stable, fast version of a new training algorithm proposed by a researcher
Other
- At least a Bachelor's degree in a related field or equivalent experience
- Location-based hybrid policy: currently, we expect all staff to be in one of our offices at least 25% of the time
- Visa sponsorship: we do sponsor visas, but we aren't able to successfully sponsor visas for every role and every candidate
- Results-oriented, with a bias towards flexibility and impact
- Pick up slack, even if it goes outside your job description
- Enjoy pair programming
- Want to learn more about machine learning research
- Care about the societal impacts of your work
- Communication skills