Microsoft is seeking to solve production-critical problems across its AI systems by applying AI/ML models and intelligent automation to detect anomalies, identify emerging patterns, and predict system behaviors before they impact customers. The goal is to increase reliability, performance, and efficiency of AI model-serving pipelines that power experiences such as Copilot and Azure OpenAI Foundry models.
Requirements
- Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python OR equivalent experience.
- 2+ years of experience in Python for AI automation
- 1+ year(s) of experience in AI to build solutions using toolchains and frameworks in Python, C++ and Kubernetes (K8s)
- 1+ year(s) of experience demonstrating a grasp of Large Language Model (LLM) post-training techniques, inference optimization, and application-layer use cases
- Familiarity with embeddings, semantic search, or LLM-based analysis used for system intelligence or reasoning.
- Familiarity with agentic frameworks (LangChain, Autogen, DSPy)
- Exposure to React or frontend observability dashboards is a plus
Responsibilities
- Design, build, and scale AI models to detect anomalies, identify regressions across large-scale AI systems.
- Analyze patterns in telemetry, logs, and real-time signals to uncover root causes, predict failures, and drive proactive mitigations.
- Apply AI to identify emerging usage trends, performance hotspots, and workload irregularities that impact system health and user experience.
- Build lightweight automation that leverages anomaly detection signals and pattern analysis to improve live-site reliability and engineering velocity.
- Contribute to hotfixes, performance tuning, and reliability improvements in production AI engines (e.g., GPU savings, SLA reliability, customer satisfaction).
- Build intuitive, responsive UI components for AI dashboards and telemetry tools using React and modern web technologies.
- Stay current with industry trends in applied AI, observability, and performance engineering.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Persuasive communication and collaboration abilities; comfortable working in ambiguous, fast-moving environments.
- Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances.