Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Reliability Data Scientist

$100 - $120

Nov 15, 2025

Remote, US

Our client is looking to solve the problem of revealing real risks in production AI by designing evaluation scenarios, datasets, and metrics

Design evaluation scenarios and metric frameworks to assess AI quality, suitability, reliability, and context-dependent behavior
Build and maintain evaluation assets including datasets, golden traces, error taxonomies, and automated scoring/aggregation pipelines in partnership with engineering
Develop and manage weekly reliability dashboards and automated reports, translating monitoring data into clear insights
Analyze evaluation results to detect drift, outliers, context-driven failures, and calibration issues—validating evaluator reliability against human judgments
Document test logic, metric definitions, and interpretation guidance, and support context-engineering workflows with metrics for predictability, observability, and directability

3–6 years of experience
Excellent communication — ability to turn technical data into decision-ready insights
This is a contract role and does not offer health benefits
Time Commitment: ~20 hours/week
Location: Remote, in the U.S.