Microsoft is looking to build a central AI data platform that breaks down data silos and manages the full lifecycle of first-party, third-party, synthetic, and human-labeled data to accelerate AI model development with secure, reusable, and compliant datasets.
Requirements
- Programming skills in Python and ML frameworks (e.g., PyTorch, TensorFlow, Scikit-learn).
- Experience with data analysis, dataset design, or evaluation methodologies.
- 2+ years of experience applying machine learning or data science in practical settings.
- Master’s degree or PhD in Computer Science, Machine Learning, Statistics, or related field, or equivalent experience.
- Experience with LLM training pipelines, synthetic data generation, or data-centric AI approaches.
- Knowledge of PII detection, data privacy, fairness, or compliance in AI systems.
- Familiarity with distributed data systems (e.g., Spark, Databricks, Azure Data Lake).
Responsibilities
- Advancing machine learning and data science to improve data quality, automate dataset generation, and design intelligent agent-driven services that manage the end-to-end data lifecycle.
- Develop ML-based pipelines for data generation, validation, augmentation, and discovery (e.g., synthetic data, human-in-the-loop workflows).
- Design and train intelligent agents to automate key parts of the dataset lifecycle, including ingestion, validation, PII detection and handling, governance, discovery, and feedback loops.
- Build evaluation methods to measure dataset quality, coverage, and usefulness for large-scale model training.
- Leverage AI/ML techniques (e.g., classification, clustering, anomaly detection, embeddings, LLM-based evaluation) to improve data discovery, curation, and governance.
- Collaborate with engineers to integrate scientific methods and models into scalable pipelines and platform services.
- Partner with AI product and research teams (CoreAI, MAI, M365, GitHub, MSR, and more) to align datasets with model training needs and identify new opportunities.
Other
- Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 1+ year(s) related experience (e.g., statistics, predictive analytics, research) OR equivalent experience.
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
- Strong collaboration skills with engineers, TPMs, and product partners across multiple orgs.
- Ability to publish or share insights internally and externally to shape Microsoft’s data-centric AI practices.