Anthropic is seeking an experienced Machine Learning Systems Engineer to develop and optimize encodings and tokenization systems for their Finetuning workflows, which are foundational to the company's AI research progress and the efficiency of their AI systems.
Requirements
- Have significant software engineering experience with demonstrated machine learning expertise
- Have experience with machine learning systems, data pipelines, or ML infrastructure
- Are proficient in Python and familiar with modern ML development practices
- Have strong analytical skills and can evaluate the impact of engineering changes on research outcomes
- Working with machine learning data processing pipelines
- Building or optimizing data encodings for ML applications
- Implementing or working with BPE, WordPiece, or other tokenization algorithms
Responsibilities
- Design, develop, and maintain tokenization systems used across Pretraining and Finetuning workflows
- Optimize encoding techniques to improve model training efficiency and performance
- Build infrastructure that enables researchers to experiment with novel tokenization approaches
- Implement systems for monitoring and debugging tokenization-related issues in the model training pipeline
- Create robust testing frameworks to validate tokenization systems across diverse languages and data types
- Identify and address bottlenecks in data processing pipelines related to tokenization
- Document systems thoroughly and communicate technical decisions clearly to stakeholders across teams
Other
- Have 8+ years of software engineering experience
- Are comfortable navigating ambiguity and developing solutions in rapidly evolving research environments
- Can work independently while maintaining strong collaboration with cross-functional teams
- Are results-oriented, with a bias towards flexibility and impact
- We require at least a Bachelor's degree in a related field or equivalent experience.