Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Reddit Logo

Staff Machine Learning Engineer, ML Platform

Reddit

$230,000 - $322,000
Nov 11, 2025
Remote, US
Apply Now

Reddit is looking to lead development of a platform for large scale ML models to boost the velocity of development for ML engineers and enable greater model scalability and iteration.

Requirements

  • 7+ years of experience in ML infrastructure, including model training and model deployments
  • Hands-on experience with ML optimization, including memory and GPU profiling
  • Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more
  • Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
  • Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
  • Deep experience working with distributed training frameworks, including Ray and Kubernetes
  • Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
  • Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus

Responsibilities

  • Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more
  • Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration
  • Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment
  • Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more
  • Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges

Other

  • Strong focus on scalability, reliability, performance, and ease of use.
  • You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
  • Strong organizational & communication skills