Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Applied AI Researcher, Benchmarking

Distyl AI

Salary not specified

Oct 16, 2025

San Francisco, CA, USA • New York, NY, USA

Distyl AI is looking for creative researchers to redefine how software is used by leveraging AI, aiming to solve complex, high-stakes challenges at scale for Global Fortune 1000 companies and drive the future of AI-powered enterprise operations.

Requirements

Experience Designing and Running Evaluations: You’ve built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance.
Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results.
Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.).
Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow.
Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.

Responsibilities

Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact.
They construct benchmarks that reflect real-world complexity.
Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment.
They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability.
Their insights drive both Distyl’s internal research priorities and industry-wide standards.
You develop intelligent systems using models rather than training or fine-tuning them.
You need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.

Other

creative researchers who don’t just want to drive incremental improvements on benchmarks or optimize an existing process but instead are looking to creatively redefine how software is used.
Our researchers come from many academic backgrounds but have strong research track records, operate in an AI-native way, and would be bored staying on the rails of a traditional research org.
Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done.
Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize.
Distyl is a hybrid working environment and requires in office collaboration 3 days a week. We have offices in SF and NYC