Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Elastic Logo

Search - Workchat - Applied Data Scientist II

Elastic

$110,900 - $210,700
Oct 23, 2025
Remote, US
Apply Now

Elastic, the Search AI Company, is looking to solve the problem of enabling everyone to find the answers they need in real time, using all their data, at scale, by building a conversational (agentic) platform that lets customers chat with their own data in Elasticsearch.

Requirements

  • 3 to 5 years in applied DS or ML with production ownership, including at least 1 to 2 years focused on evaluating LLM or agent workflows in shipped systems
  • Proven experience designing and running stepwise evaluations for agent pipelines: retrieval coverage, reranking quality, reasoning traces, tool selection accuracy, citation grounding, and final answer helpfulness and faithfulness
  • Golden set hygiene: stratified dataset design, leakage controls, reviewer guidelines, inter-rater checks, and versioned labels
  • Fluent with offline IR metrics and guardrails: Recall at k, nDCG, MRR, groundedness or citation support, plus latency and cost tracking; can move from offline gains to online A or B tests
  • Practical Elasticsearch experience or a similar search system; ES|QL familiarity is a plus

Responsibilities

  • Own well scoped pieces of the offline and online evaluation pipeline for agent workflows: retrieval coverage, reranking quality, reasoning traces, tool selection accuracy, citation integrity, and final answer helpfulness and faithfulness
  • Calibrate and validate LLM-as-judge rubrics against human labels, track agreement with statistics, and add periodic checks to prevent drift
  • Instrument agent runs with traces so you can localize errors to retrieval, reasoning, tool execution, or grounding, then contribute CI checks that block merges on regressions
  • Translate evaluation readouts into product calls such as model choice, routing policy, tool gating thresholds, prompt and chunking updates, and agent customization for Elastic use cases
  • Collaborate with backend engineers on contracts for ES|QL, citations, and telemetry schemas, and with PM and UX to land findings in shipped features
  • Share outcomes through clear docs, notebooks, and PRs, and contribute utilities that make evaluation faster and more reproducible for the team

Other

  • 3 to 5 years in applied DS or ML with production ownership
  • Strong written communication and async collaboration habits in a distributed team
  • Competitive pay based on the work you do here and not your previous salary
  • Health coverage for you and your family in many locations
  • Ability to craft your calendar with flexible locations and schedules for many roles