The business problem is to improve the quality of language understanding (LU) and grounding data for M365 Copilot and Microsoft Search. This involves ensuring that user queries are accurately interpreted, leading to precise and actionable signals that power these tools, thereby improving relevance and limiting hallucinations in Copilot's responses.
Requirements
- Experience in Python, C-Sharp or similar programming languages for model development, training, and evaluation
- Experience with large-scale machine learning systems, including training, fine-tuning, and deployment of LLMs or similar models
- Hands-on experience with prompt engineering, reinforcement learning from human feedback (RLHF), or reward modeling
- Familiarity with distributed systems and cloud platforms (e.g., Azure) for large-scale ML workloads
Responsibilities
- Improve LLM capabilities influencing grounding data quality.
- Use latest research to post-train LLMs.
- Build and evaluate metrics and reward models.
- Add new functionality and ensure high quality for customers using Copilot on their Business data.
- Design and implement scalable evaluation pipelines for LU and grounding quality
- Collaborate with cross-functional teams to define success metrics and drive continuous improvement in Copilot experiences
- Large Language Model (LLM)/SLMpowered LU: intent & slotting with calibrated confidence, optimized for low latency and high reliability in Copilot experiences
Other
- 3 days / week in-office
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check
- Microsoft will accept applications for the role until October 31, 2025.