Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Microsoft Logo

Principal Applied Scientist

Microsoft

$163,000 - $331,200
Nov 22, 2025
Redmond, WA, United States of America
Apply Now

Microsoft's Azure Data engineering team is looking to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture. The Real-Time Intelligence (RTI) team within Microsoft Fabric is seeking a Principal Applied Scientist to lead the science of evaluating and improving LLM-powered agents operating on live operational data.

Requirements

  • 2+ years designing and running ML/LLM evaluation and experimentation (offline metrics + online A/B tests)
  • Proven experience applying machine learning, statistics, and measurement science to LLM and agent evaluation, ideally in real-time or streaming scenarios.
  • Proficiency in agentic AI concepts (e.g., multi-step agents, tool orchestration, retrieval/RAG, workflow automation) and familiarity with techniques for assessing safety, robustness, anomaly detection, and causal impact of agent behaviors.
  • Strong programming and modeling skills in languages such as Python, and experience building evaluation services or pipelines on distributed systems (e.g., running large-scale offline evals, auto-raters, or LLM-as-judge workloads).
  • Ability to design, implement, and interpret rigorous evaluations end-to-end: constructing eval sets and scenarios, combining offline metrics with human/LLM raters, running online experiments (A/B tests, holdouts), and instrumenting reliability monitoring at scale.

Responsibilities

  • Lead end-to-end science for evaluating LLM-powered agents on real-time and batch workloads: designing evaluation frameworks, metrics, and pipelines that capture planning quality, tool use, retrieval, safety, and end-user outcomes, and partnering with engineering for robust, low-latency deployment.
  • Advance evaluation methodologies for agents across RTI surfaces by driving test set design, auto-raters (including LLM-as-judge), human-in-the-loop feedback loops, and measurable lifts in key quality metrics such as task success rate, reliability, and safety.
  • Establish rigorous evaluation and reliability practices for LLM/agent systems: from offline benchmarks and scenario-based evals to online experiments and production monitoring, defining guardrails and policies that balance quality, cost, and latency at scale.
  • Collaborate with PM, Engineering, and UX to translate evaluation insights into customer-visible improvements, shaping product requirements, de-risking launches, and iterating quickly based on telemetry, user feedback, and real-world failure modes.
  • Provide technical leadership and mentorship within the applied science and engineering community, fostering inclusive, responsible-AI practices in agent evaluation, and influencing roadmap, platform investments, and cross-team evaluation strategy across Fabric.

Other

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
  • Collaborative mindset with demonstrated success partnering across Engineering, PM, and UX to define quality bars, translate evaluation insights into roadmap decisions, and iterate quickly on customer-facing agent and LLM experiences.
  • Embody our culture and values
  • Microsoft is an equal opportunity employer.