Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Distyl AI Logo

Applied AI Researcher, Benchmarking

Distyl AI

Salary not specified
Oct 16, 2025
San Francisco, CA, USA • New York, NY, USA
Apply Now

Distyl AI is looking for creative researchers to redefine how software is used by leveraging AI, aiming to solve complex, high-stakes challenges at scale for Global Fortune 1000 companies and drive the future of AI-powered enterprise operations.

Requirements

  • Experience Designing and Running Evaluations: You’ve built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance.
  • Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results.
  • Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.).
  • Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow.
  • Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.

Responsibilities

  • Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact.
  • They construct benchmarks that reflect real-world complexity.
  • Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment.
  • They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability.
  • Their insights drive both Distyl’s internal research priorities and industry-wide standards.
  • You develop intelligent systems using models rather than training or fine-tuning them.
  • You need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.

Other

  • creative researchers who don’t just want to drive incremental improvements on benchmarks or optimize an existing process but instead are looking to creatively redefine how software is used.
  • Our researchers come from many academic backgrounds but have strong research track records, operate in an AI-native way, and would be bored staying on the rails of a traditional research org.
  • Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done.
  • Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize.
  • Distyl is a hybrid working environment and requires in office collaboration 3 days a week. We have offices in SF and NYC