Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Umbrex Logo

Reliability Data Scientist

Umbrex

$100 - $120
Nov 15, 2025
Remote, US
Apply Now

Our client is looking to solve the problem of revealing real risks in production AI by designing evaluation scenarios, datasets, and metrics

Requirements

  • Strong Python + SQL + data-wrangling skills
  • Hands-on experience with evaluation design, sampling, and calibration
  • Comfort with dashboards (Grafana, PowerBI, or similar)
  • Experience building golden datasets and structured evaluation traces
  • Exposure to LLM or AI system evaluation (preferred)
  • Experience in regulated industries (audit, finance, healthcare) is a plus

Responsibilities

  • Design evaluation scenarios and metric frameworks to assess AI quality, suitability, reliability, and context-dependent behavior
  • Build and maintain evaluation assets including datasets, golden traces, error taxonomies, and automated scoring/aggregation pipelines in partnership with engineering
  • Develop and manage weekly reliability dashboards and automated reports, translating monitoring data into clear insights
  • Analyze evaluation results to detect drift, outliers, context-driven failures, and calibration issues—validating evaluator reliability against human judgments
  • Document test logic, metric definitions, and interpretation guidance, and support context-engineering workflows with metrics for predictability, observability, and directability

Other

  • 3–6 years of experience
  • Excellent communication — ability to turn technical data into decision-ready insights
  • This is a contract role and does not offer health benefits
  • Time Commitment: ~20 hours/week
  • Location: Remote, in the U.S.