Mercor is looking to advance the frontier of model evaluations to drive model improvements across the industry that create real world economic value.
Requirements
- Strong understanding of LLMs and the data on which they are trained and evaluated against.
- Familiarity with data annotation workflows.
- Good understanding of statistics.
Responsibilities
- Build benchmarks that measure real world value of AI models.
- Publish LLM evaluation papers in top conferences with the support of the Mercor Applied AI and Operations teams.
- Push the frontier of understanding data ROI in model development including multi-modality, code, tool-use, and more.
- Design and validate novel data collection and annotation offerings for the leading industry labs and big tech companies.
Other
- PhD or M.S. and 2+ years of work experience in a computer science, electrical engineering, econometrics, or another STEM field that provides a solid understanding of ML and model evaluation.
- Strong publication record in AI research, ideally in LLM evaluation. Dataset and evaluation papers are preferred.
- Strong communication skills and ability to present findings clearly and concisely.
- Willingness to work 6 days a week, with monday-friday in person in San Francisco