Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Great Value Hiring Logo

Data Scientist

Great Value Hiring

$100 - $120
Nov 6, 2025
Remote, US
Apply Now

Conduct comprehensive failure analysis on AI agent performance across finance-sector tasks to identify patterns, root causes, and systemic issues in the evaluation framework.

Requirements

  • Strong foundation in statistical analysis, hypothesis testing, and pattern recognition
  • Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis
  • Experience with exploratory data analysis and creating actionable insights from complex datasets
  • Understanding of LLM evaluation methods and quality metrics
  • Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL
  • Experience with AI/ML model evaluation or quality assurance
  • Familiarity with benchmark datasets and evaluation frameworks

Responsibilities

  • Statistical Failure Analysis: Identify patterns in AI agent failures across task components (prompts, rubrics, templates, file types, tags)
  • Root Cause Analysis: Determine whether failures stem from task design, rubric clarity, file complexity, or agent limitations
  • Dimension Analysis: Analyze performance variations across finance sub-domains, file types, and task categories
  • Reporting & Visualization: Create dashboards and reports highlighting failure clusters, edge cases, and improvement opportunities
  • Quality Framework: Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings
  • Stakeholder Communication: Present insights to data labeling experts and technical teams

Other

  • Background in finance or willingness to learn finance domain concepts
  • Experience with multi-dimensional failure analysis
  • 2-4 years of relevant experience