Anthropic is looking to build and maintain critical systems that serve their LLMs to a diverse set of consumers, focusing on scaling inference systems, ensuring reliability, optimizing compute resource efficiency, and developing new inference capabilities.
Requirements
- High performance, large-scale distributed systems
- Implementing and deploying machine learning systems at scale
- LLM optimization batching and caching strategies
- Kubernetes
- Python
Responsibilities
- Optimizing inference request routing to maximize compute efficiency
- Autoscaling compute fleet to effectively match compute supply with inference demand
- Contributing to new inference features (e.g. structured sampling, fine tuning)
- Supporting inference for new model architectures
- Ensuring smooth and regular deployment of inference services
- Analyzing observability data to tune performance based on production workloads
Other
- Results-oriented, with a bias towards flexibility and impact
- Pick up slack, even if it goes outside your job description
- Enjoy pair programming
- Want to learn more about machine learning research
- Care about the societal impacts of your work
- At least a Bachelor's degree in a related field or equivalent experience
- Location-based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time
- Visa sponsorship: We do sponsor visas, but we aren't able to successfully sponsor visas for every role and every candidate