Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Member of Technical Staff- Machine Learning Operations Engineer

Microsoft Innovation Center

$119,800 - $304,200

Nov 16, 2025

Mountain View, CA, United States of America

Microsoft Copilot is looking to build the best AI powered products in the world and needs someone to architect and build the infrastructure that makes that possible, specifically to close the gap between ML's potential and its messy reality in production

Requirements

6 years experience building and operating ML systems in production, with real stories about what breaks at scale and how you fixed it
5 years of experience of software engineering fundamentals with experience in distributed systems, containerization (Docker/Kubernetes), and cloud platforms (AWS/GCP/Azure)
5 years of hands-on experience with ML orchestration tools (Airflow, Kubeflow, Metaflow), experiment tracking, model registries, and feature stores
5 years of experience optimizing model inference, wrestled with GPU utilization, and know the tradeoffs between latency, throughput, and cost
Familiarity with LLM deployment patterns, vector databases, prompt management, and the unique challenges of serving foundation models
Experience working with RAG, fine-tuning pipelines, or evaluation frameworks

Responsibilities

Training pipelines that scale elegantly - Design and implement robust training infrastructure that handles everything from data ingestion to model versioning, making it trivial for ML engineers to experiment and deploy with confidence
The data flywheel - Build the infrastructure and product features that capture user interactions, ground truth labels, and edge cases, then automatically route them back into training loops. Turn every production interaction into a training example
Inference systems that deliver - Dive deep into model serving architecture—optimize latency, manage costs, implement intelligent caching, and build the observability needed to maintain reliability at scale
Deployment pipelines with guardrails - Create deployment systems that balance velocity with safety: automated testing, gradual rollouts, performance monitoring, and quick rollback mechanisms
Cross-functional infrastructure - Partner closely with ML engineers, platform engineers, and data scientists to build APIs and tools that enable tight, rapid feedback loops from production back to model development

Other

Doctorate in Computer Science, Statistics, Software Engineering, or related field AND 3 year(s) applied ML engineering experience
OR Master's Degree in Computer Science, Statistics, Software Engineering, or related field AND 4 years applied ML engineering experience
OR Bachelor's Degree in Computer Science, Data Engineering, Software Engineering, or related field AND 6 years applied ML experience
Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) or 25 miles (non-U.S., country-specific) of that location
Desire and preference to work at the intersection of teams, translating between ML researchers who want flexibility and engineers who need reliability