Mercor is looking to solve the problem of AI model failures in finance-sector tasks by analyzing and identifying patterns in AI agent performance failures
Requirements
- Strong foundation in statistical analysis, hypothesis testing, and pattern recognition.
- Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis.
- Experience with exploratory data analysis and deriving actionable insights from complex datasets.
- Familiarity with LLM evaluation methods and quality metrics.
- Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL.
Responsibilities
- Conduct statistical failure analysis across finance-sector AI tasks.
- Identify patterns in AI agent performance failures across task components (e.g., prompts, rubrics, templates).
- Perform root cause analysis to determine if failures stem from task design, rubric clarity, or agent limitations.
- Analyze performance variations across finance sub-domains, file types, and task categories.
- Create dashboards and reports highlighting failure clusters, edge cases, and areas for improvement.
- Recommend improvements to task design, rubric structure, and evaluation criteria.
- Communicate insights to data labeling experts and technical teams.
Other
- Upload resume.
- AI interview based on your resume (15 min).
- Submit form.
- Strong relevant experience.
- Commitment: 10-40 hours/week, flexible and asynchronous