Cohere is looking to solve the problem of Large Language Models (LLMs) inference efficiency, which is still a bottleneck, by pushing the limits of LLM inference efficiency across their foundation models.
Requirements
PhD in Machine Learning or a related field
Understand LLM architecture, and how to optimize LLM inference given resource constraints
Significant experience with one or more techniques that enhance model efficiency
Strong software engineering skills
Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS)
Responsibilities
Develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production
Explore and ship breakthroughs across the model execution stack, including model architecture and MoE routing optimization
Decoding and inference-time algorithm improvements
Software/hardware co-design for GPU acceleration
Performance optimization without compromising model quality
Other
Have an appetite to work in a fast-paced high-ambiguity start-up environment
Passion to mentor others
100% Parental Leave top-up for up to 6 months
6 weeks of vacation (30 working days!)
Full health and dental benefits, including a separate budget to take care of your mental health