Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Call For Referral Logo

Data Scientist (Part-Time | $100 –$120/hr)

Call For Referral

$100 - $120
Nov 16, 2025
San Francisco, CA, United States of America
Apply Now

Mercor is partnering with a leading AI research lab to hire experienced Data Scientists specializing in AI task evaluation and statistical analysis to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks — identifying systemic patterns, diagnosing performance bottlenecks, and improving model evaluation frameworks.

Requirements

  • Strong foundation in statistical analysis, hypothesis testing, and pattern recognition.
  • Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis.
  • Hands-on experience with exploratory data analysis (EDA) and feature interpretation.
  • Understanding of AI/ML evaluation methodologies and LLM performance metrics.
  • Skilled in using Excel, SQL, and data visualization tools (e.g., Tableau, Looker).
  • Experience with AI/ML model evaluation or quality assurance pipelines.
  • Familiarity with benchmark datasets, failure mode analysis, and evaluation frameworks.

Responsibilities

  • Statistical Failure Analysis: Identify recurring patterns in AI agent failures across task components (prompts, rubrics, file types, tags, etc.).
  • Root Cause Analysis: Determine whether issues stem from task design, rubric clarity, file complexity, or agent limitations.
  • Dimensional Analysis: Examine performance variations across finance sub-domains, file structures, and evaluation criteria.
  • Visualization & Reporting: Build dashboards and analytical reports that highlight edge cases, performance clusters, and opportunities for improvement.
  • Framework Enhancement: Recommend refinements to rubric design, evaluation metrics, and task structures based on empirical findings.
  • Stakeholder Communication: Present key insights to data labeling teams, ML engineers, and research collaborators.

Other

  • Part-time, 20–25 hours/week
  • Fully remote and asynchronous — work on your own time
  • Duration: 1–2 months, with strong potential for extension
  • Start Date: Immediate
  • 2–4 years of relevant professional experience in data science, analytics, or applied statistics.