Scale is looking to advance the science of evaluating and characterizing large language models (LLMs) by tackling hard problems in scalable oversight and advanced AI capabilities.
Requirements
Track record of impactful research in machine learning, especially in generative AI, evaluation, or oversight.
Publications at major ML/AI conferences (e.g. NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR) and/or journals.
Responsibilities
Lead a team of research scientists and engineers on foundational work in evaluation and oversight.
Drive research initiatives on frameworks and benchmarks for frontier AI models, spanning reasoning, coding, multi-modal, and agentic behaviors.
Design and advance scalable oversight methods, leveraging model-assisted evaluation, rubric-guided judgments, and recursive oversight.
Collaborate with leading research labs across industry and academia.
Publish research at top-tier venues and contribute to open-source benchmarking initiatives.
Remain deeply engaged with the research community, both understanding trends and setting them.
Developing AI-assisted evaluation pipelines, where models help critique, grade, and explain outputs (e.g. RLAIF, model-judging-model).
Other
Significant experience leading ML research in academia or industry.
Strong written and verbal communication skills for cross-functional collaboration.
Experience building and mentoring teams of research scientists and engineers.
The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position, determined by work location and additional factors, including job-related skills, experience, interview performance, and relevant education or training.
Scale employees in eligible roles are also granted equity based compensation, subject to Board of Director approval.