Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineering Manager - LLM Serving & Infrastructure

Spotify

$176,166 - $251,666

Oct 23, 2025

New York, NY, United States of America • Boston, MA, United States of America •

Generative AI is transforming Spotify’s product capabilities and technical architecture. Generative recommender systems, agent frameworks, and LLMs present huge opportunities for our products to serve more user needs and use cases and unlock richer understanding of our content and users. This ML Manager will focus on the serving of a Unified Recommender model, based on open-weight LLM and transformer technology.

Requirements

Hands-on with ML Engineering: you have deep expertise in building, scaling, and governing high-quality ML systems and datasets, including defining data schemas, handling data lineage, and implementing data validation pipelines (e.g., HuggingFace datasets library or similar internal systems).
Deep technical background in building and operating large-scale, high-velocity Machine Learning/MLOps infrastructure, ideally for personalization, recommendation, or Large Language Models (LLMs).
Expertise in designing robust, loosely coupled systems with clean APIs and clear separation of concerns (e.g., distinguishing between fast dev-time tools and rigorous production-like systems).
Experience integrating evaluation and testing into continuous integration/continuous deployment (CI/CD) pipelines to enable rapid 'fork-evaluate-merge' developer workflows.
Solid understanding of experiment tracking and results visualization platforms (e.g., MLFlow, custom UIs).
5+ years of experience in software or machine learning engineering
2+ years of experience managing an engineering team.

Responsibilities

Lead a high-performing engineering team to develop, build, and deploy a high-scale, low-latency LLM Serving Infrastructure.
Drive the implementation of a unified serving layer to support multiple LLM models and inference types (batch, offline eval flows and real-time/streaming).
Lead all aspects of the development of the Model Registry for deploying, versioning, and running LLMs across production environments.
Ensure successful integration with the core Personalization and Recommendation systems to deliver LLM-powered features.
Define and champion standardized technical interfaces and protocols for efficient model deployment and scaling.
Establish and monitor the serving infrastructure's performance, cost, and reliability, including load balancing, autoscaling, and failure recovery.
Scale up the serving architecture to handle hundreds of millions of users and high-volume inference requests for internal domain-specific LLMs.

Other

Proven track record to drive complex projects involving multiple partners and federated contribution models ("one source of truth, many contributors").
A pragmatic leader who can balance the need for speed with progressive rigor and production fidelity.
Collaborate closely with data science, machine learning research, and feature teams (Autoplay, Home, Search, etc.) to drive the active adoption of the serving infrastructure.
Drive Latency and Cost Optimization: partner with SRE and ML teams to implement techniques like quantization, pruning, and efficient batching to minimize serving latency and cloud compute costs.
Develop Observability and Monitoring: build dashboards and alerting for service health, tracing, A/B test traffic, and latency trends to ensure consistency to defined SLAs.