Advancing Microsoft's Outlook Co-pilot efforts in Large Language Model (LLM), Prompt Eng, Evaluation, Relevance and Responsible AI (RAI) by developing an end-to-end infrastructure and measurement framework.
Requirements
1+ year(s) experience in data science, machine learning, experimentation, and AI, with a strong track record of delivering impactful results.
1+ year(s) experience of LLM in finetuning, evaluation techniques, implementing RAG techniques and industry best practices.
Understanding of Responsible AI (RAI) principles.
Responsibilities
Develop and execute a comprehensive strategy for LLM evaluation, encompassing LLM quality, costs, model performance, model utility (user experience and prompt effectiveness), and responsible AI considerations, in alignment with company-wide efforts and informed by emerging research.
Oversee and manage large-scale, cross-functional evaluation programs, ensuring alignment with organizational objectives and timelines.
Develop and maintain a robust measurement framework to track and report on LLM performance and user impact.
Drive engineering product roadmap to construct automated evaluation pipelines integrated into the product workflow.
Utilize strong data science skills to design experimentation, analyze data, create OKRs, create measurement and metrics, and derive actionable insights to enhance LLM systems.
Responsible for influencing product and user experience based on evaluation results.
Lead efforts to assess and improve the performance and effectiveness of language models and prompts, driving iterative enhancements, including synthetic and manufactured data creation.
Other
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Analytical and problem-solving skills.
Competent communication and presentation abilities.
Ability to work in a fast-paced and dynamic environment.