Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Cohere Logo

Staff Research Engineer, Model Efficiency

Cohere

Salary not specified
Nov 7, 2025
New York, NY, US • San Francisco, CA, US
Apply Now

Cohere is looking to solve the problem of Large Language Models (LLMs) inference efficiency, which is still a bottleneck, by pushing the limits of LLM inference efficiency across their foundation models.

Requirements

  • PhD in Machine Learning or a related field
  • Understand LLM architecture, and how to optimize LLM inference given resource constraints
  • Significant experience with one or more techniques that enhance model efficiency
  • Strong software engineering skills
  • Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS)

Responsibilities

  • Develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production
  • Explore and ship breakthroughs across the model execution stack, including model architecture and MoE routing optimization
  • Decoding and inference-time algorithm improvements
  • Software/hardware co-design for GPU acceleration
  • Performance optimization without compromising model quality

Other

  • Have an appetite to work in a fast-paced high-ambiguity start-up environment
  • Passion to mentor others
  • 100% Parental Leave top-up for up to 6 months
  • 6 weeks of vacation (30 working days!)
  • Full health and dental benefits, including a separate budget to take care of your mental health