Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Data Scientist

$119,800 - $234,700

Oct 30, 2025

Redmond, WA, United States of America

Microsoft 365 Copilot quality improvement through customer feedback and evaluation datasets

Experience with building data pipelines, performing large-scale analysis, and implementing ML workflows using Python and SQL.
Experience in developing models or designing evaluation frameworks, including A/B testing or prompt-based assessments for LLMs.
LLM fundamentals: prompt engineering, few‑shot design, retrieval metrics, multi‑turn/agent trace evaluation.
Data quality mindset: trace hygiene, metadata design, policy/PII awareness, and principled guardrails.
Experience building graders that score persona/tone, contract/formatting (e.g., JSON validity, schema), and tool‑use correctness.
Background with structured synthetic data generation and vendor annotation programs; familiarity with judge mutation/optimization loops.
AI & Technical Fluency: You don't need to train models, but you know how they work, how to test them, and how to build great products on top of them.

Evaluation & Feedback Analysis
Convert multi‑source feedback (dogfood, VIP customers, production traces) into a prioritized dataset of 10–100 tasks per scenario, each with prompts and golden outputs; maintain a living failure taxonomy prioritized by volume × impact × fixability.
Rubrics & LLM‑as‑Judge
Author crisp, binary‑first rubrics across 7–30 dimensions (e.g., correctness/completeness, refusal calibration, tool‑use quality, formatting/contract, persona/tone, trace hygiene).
Build grader prompts (with few‑shots and counter‑examples) that achieve ≥80% human‑match rate, track TPR/TNR on held‑out sets, and prevent reward hacking.
Synthetic & Human‑Labeled Data
Design structured tuples to scale high‑signal synthetic data; orchestrate vendor/partner annotation sprints and live calibrations to align shared judgment.
Ensure datasets are reproducible with linked artifacts and robust metadata/trace hygiene.
Customer‑Grounded Scenarios

Doctorate in Data Science, Mathematics, Statistics, Econometrics, Economics, Operations Research, Computer Science, or related field AND 1+ year(s) data-science experience
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Ability to work in a fast-paced, ambiguous environment and deliver results under tight deadlines.
2+ years customer-facing, project-delivery experience, professional services, and/or consulting experience.
Experience in communication and stakeholder management skills.