Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Research Engineer, Model Efficiency

Cohere

Salary not specified

Nov 7, 2025

New York, NY, US • San Francisco, CA, US

Cohere is looking to solve the problem of Large Language Models (LLMs) inference efficiency, which is still a bottleneck, by pushing the limits of LLM inference efficiency across their foundation models.

Requirements

PhD in Machine Learning or a related field
Understand LLM architecture, and how to optimize LLM inference given resource constraints
Significant experience with one or more techniques that enhance model efficiency
Strong software engineering skills
Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS)

Responsibilities

Develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production
Explore and ship breakthroughs across the model execution stack, including model architecture and MoE routing optimization
Decoding and inference-time algorithm improvements
Software/hardware co-design for GPU acceleration
Performance optimization without compromising model quality

Other

Have an appetite to work in a fast-paced high-ambiguity start-up environment
Passion to mentor others
100% Parental Leave top-up for up to 6 months
6 weeks of vacation (30 working days!)
Full health and dental benefits, including a separate budget to take care of your mental health