Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Netflix Logo

Software Engineer 5 - Offline Inference - Machine Learning Platform

Netflix

$100,000 - $720,000
Sep 22, 2025
Remote, US
Apply Now

Netflix is looking to solve the problem of building a scalable and reliable machine learning platform to accelerate every ML practitioner at the company, with a focus on batch-prediction layer and large-scale batch inference workloads.

Requirements

  • Hands-on experience with ML engineering or production systems involving training or inference of deep-learning models.
  • Proven track record of operating scalable infrastructure for ML workloads (batch or online).
  • Proficiency in one or more modern backend languages (e.g. Python, Java, Scala).
  • Production experience with containerization & orchestration (Docker, Kubernetes, ECS, etc.) and at least one major cloud provider (AWS preferred).
  • Deep understanding of real-world ML development workflows and close partnership with ML researchers or modeling engineers.
  • Familiarity with cloud-based AI/ML services (e.g., SageMaker, Bedrock, Databricks, OpenAI, Vertex) or open-source stacks (Ray, Kubeflow, MLflow).
  • Experience optimizing inference for large language models, computer-vision pipelines, or other foundation models (e.g., FSDP, tensor/pipeline parallelism, quantization, distillation).

Responsibilities

  • Build developer-friendly APIs, SDKs, and CLIs that let researchers and engineers—experts and non-experts alike—submit and manage batch inference jobs with minimal effort, particularly in the domain of content and media
  • Design, implement, and operate distributed services that package, schedule, execute, and monitor batch inference workflows at massive scale.
  • Instrument the platform for reliability, debuggability, observability, and cost control; define SLOs and share an equitable on-call rotation
  • Foster a culture of engineering excellence through design reviews, mentorship, and candid, constructive feedback

Other

  • Excellent written and verbal communication skills; effective collaboration across distributed teams and time zones.
  • Comfortable working in a team with peers and partners distributed across (US) geographies & time zones.
  • Commitment to operational best practices—observability, logging, incident response, and on-call excellence.
  • Bachelor's, Master's, or Ph.D. degree in Computer Science or related field (not explicitly mentioned but implied)
  • Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off.