Ensure correct and accurate functionality of Copilot by developing new methods to evaluate LLMs, experimenting with data collection techniques for prompt engineering and fine tuning, or training content classifiers to support the Copilot experience.
Requirements
- Experience prompting, evaluating, and working with large language models.
- Experience writing production-quality Python code.
Responsibilities
- Leverage subject matter expertise to uncover and mitigate model quality and model performance issues in consumer Copilot.
- Oversee data acquisition or generation efforts, ensuring that the data meets product needs.
- Generalize machine learning (ML) solutions into repeatable frameworks.
- Lead evaluation efforts of models deployed within Copilot.
- Conduct thorough review of data analysis and techniques used to summarize the process review and highlight areas that have been missed or need re-examining.
- Track advances in industry and academia, identifies relevant state-of-the-art research, and adapts algorithms and/or techniques to drive innovation and develop new solutions.
- Independently write efficient, readable, extensible code and model pipelines.
Other
- By applying to this U.S. Mountain View, CA position, you are required to be local to the San Francisco area and in office 3 days a week.
- Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location.
- Commit to a customer-oriented focus by acknowledging customer needs and perspectives, validating customer perspectives, focusing on broader customer context, and serving as a trusted advisor.
- Demonstrated interest in Responsible AI.