Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineering Manager, LLM Serving & Infrastructure

Spotify

$176,166 - $251,666

Oct 30, 2025

Boston, NY, US

Generative AI is transforming Spotify’s product capabilities and technical architecture. Generative recommender systems, agent frameworks, and LLMs present huge opportunities for our products to serve more user needs and use cases and unlock richer understanding of our content and users. This ML Manager will focus on the serving of a Unified Recommender model, based on open-weight LLM and transformer technology.

Requirements

Hands-on with ML Engineering: you have deep expertise in building, scaling, and governing high-quality ML systems and datasets, including defining data schemas, handling data lineage, and implementing data validation pipelines (e.g., HuggingFace datasets library or similar internal systems).
Deep technical background in building and operating large-scale, high-velocity Machine Learning/MLOps infrastructure, ideally for personalization, recommendation, or Large Language Models (LLMs).
Proven track record to drive complex projects involving multiple partners and federated contribution models ("one source of truth, many contributors").
Expertise in designing robust, loosely coupled systems with clean APIs and clear separation of concerns (e.g., distinguishing between fast dev-time tools and rigorous production-like systems).
Experience integrating evaluation and testing into continuous integration/continuous deployment (CI/CD) pipelines to enable rapid 'fork-evaluate-merge' developer workflows.
Solid understanding of experiment tracking and results visualization platforms (e.g., MLFlow, custom UIs).
5+ years of experience in software or machine learning engineering, with 2+ years of experience managing an engineering team.

Responsibilities

Lead a high-performing engineering team to develop, build, and deploy a high-scale, low-latency LLM Serving Infrastructure.
Drive the implementation of a unified serving layer to support multiple LLM models and inference types (batch, offline eval flows and real-time/streaming).
Lead all aspects of the development of the Model Registry for deploying, versioning, and running LLMs across production environments.
Ensure successful integration with the core Personalization and Recommendation systems to deliver LLM-powered features.
Define and champion standardized technical interfaces and protocols for efficient model deployment and scaling.
Establish and monitor the serving infrastructure's performance, cost, and reliability, including load balancing, autoscaling, and failure recovery.
Scale up the serving architecture to handle hundreds of millions of users and high-volume inference requests for internal domain-specific LLMs.

Other

We seek to understand the world of music and podcasts better than anyone else so that we can make great recommendations to every individual and keep the world listening.
You will collaborate with a diverse team to establish and implement the machine learning plan for the product domain, developing innovative recommendations and agent interactions.
You will work as a technology leader, managing a team and influencing peers.
You will collaborate with internal customers and platform teams, offering the opportunity to profoundly build the direction of the entire Spotify experience.
A pragmatic leader who can balance the need for speed with progressive rigor and production fidelity.