Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Search - Workchat - Applied Data Scientist II

Elastic

$110,900 - $210,700

Oct 23, 2025

Remote, US

Elastic, the Search AI Company, is looking to solve the problem of enabling everyone to find the answers they need in real time, using all their data, at scale, by building a conversational (agentic) platform that lets customers chat with their own data in Elasticsearch.

Requirements

3 to 5 years in applied DS or ML with production ownership, including at least 1 to 2 years focused on evaluating LLM or agent workflows in shipped systems
Proven experience designing and running stepwise evaluations for agent pipelines: retrieval coverage, reranking quality, reasoning traces, tool selection accuracy, citation grounding, and final answer helpfulness and faithfulness
Golden set hygiene: stratified dataset design, leakage controls, reviewer guidelines, inter-rater checks, and versioned labels
Fluent with offline IR metrics and guardrails: Recall at k, nDCG, MRR, groundedness or citation support, plus latency and cost tracking; can move from offline gains to online A or B tests
Practical Elasticsearch experience or a similar search system; ES|QL familiarity is a plus

Responsibilities

Own well scoped pieces of the offline and online evaluation pipeline for agent workflows: retrieval coverage, reranking quality, reasoning traces, tool selection accuracy, citation integrity, and final answer helpfulness and faithfulness
Calibrate and validate LLM-as-judge rubrics against human labels, track agreement with statistics, and add periodic checks to prevent drift
Instrument agent runs with traces so you can localize errors to retrieval, reasoning, tool execution, or grounding, then contribute CI checks that block merges on regressions
Translate evaluation readouts into product calls such as model choice, routing policy, tool gating thresholds, prompt and chunking updates, and agent customization for Elastic use cases
Collaborate with backend engineers on contracts for ES|QL, citations, and telemetry schemas, and with PM and UX to land findings in shipped features
Share outcomes through clear docs, notebooks, and PRs, and contribute utilities that make evaluation faster and more reproducible for the team

Other

3 to 5 years in applied DS or ML with production ownership
Strong written communication and async collaboration habits in a distributed team
Competitive pay based on the work you do here and not your previous salary
Health coverage for you and your family in many locations
Ability to craft your calendar with flexible locations and schedules for many roles