Distyl AI is looking for creative researchers to redefine how software is used by leveraging AI, aiming to solve complex, high-stakes challenges at scale for Global Fortune 1000 companies and drive the future of AI-powered enterprise operations.
Requirements
- Experience Designing and Running Evaluations: You’ve built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance.
- Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results.
- Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.).
- Uses AI Every Day: Before you can revolutionize someone else’s workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow.
- Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.
Responsibilities
- Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact.
- They construct benchmarks that reflect real-world complexity.
- Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment.
- They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability.
- Their insights drive both Distyl’s internal research priorities and industry-wide standards.
- You develop intelligent systems using models rather than training or fine-tuning them.
- You need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI.
Other
- creative researchers who don’t just want to drive incremental improvements on benchmarks or optimize an existing process but instead are looking to creatively redefine how software is used.
- Our researchers come from many academic backgrounds but have strong research track records, operate in an AI-native way, and would be bored staying on the rails of a traditional research org.
- Proven Track Record of Research Results: Whether you’ve published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done.
- Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize.
- Distyl is a hybrid working environment and requires in office collaboration 3 days a week. We have offices in SF and NYC