Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Software Engineer, ML Serving Platform

DoorDash

Salary not specified

Aug 25, 2025

San Francisco, CA, US

DoorDash is looking to solve the problem of building a reliable on-demand logistics engine, specifically by driving the next generation of their inference platform to power real-time predictions across millions of requests per second.

Requirements

8+ years of engineering experience, including building or operating large-scale, high-QPS ML serving systems.
Deep familiarity with ML inference and serving ecosystems.
Knowledge of how to leverage and extend open-source frameworks and evaluate vendor solutions pragmatically.
GPU serving expertise - Experience with frameworks like NVIDIA Triton, TensorRT-LLM, ONNX Runtime, or vLLM, including hands-on use of KV caching, batching, and memory-efficient inference.
Familiarity with deep learning frameworks (PyTorch, TensorFlow) and large language models (LLMs) such as GPT-OSS or BERT.
Hands-on experience with Kubernetes/EKS, microservice architectures, and large-scale orchestration for inference workloads.
Cloud experience (AWS, GCP, Azure) with a focus on scaling strategies, observability, and cost optimization.

Responsibilities

Scale richer models at low latency by designing serving systems that handle large, complex models while balancing cost, throughput, and strict latency SLOs.
Bring modern inference optimizations into production by operationalizing advances from the ML serving ecosystem to deliver better user experience, latency, and cost efficiency across the fleet.
Enable platform-wide impact by building abstractions and primitives that let serving improvements apply broadly across many workloads, rather than point solutions for individual models.
Leverage and contribute to OSS by applying the best of the open-source serving ecosystem and vendor solutions, and contributing improvements back where it helps the community.
Drive cost & reliability by designing autoscaling and scheduling across heterogeneous hardware (GPU/TPU/CPU), with strong isolation, observability, and tail-latency control.
Collaborate broadly by partnering with ML engineers, infra teams, external vendors, and open-source communities to ensure the serving stack evolves with the needs of the business.
Raise the engineering bar by establishing metrics & processes that improve developer velocity, system reliability, and long-term maintainability.

Other

Have 8+ years of engineering experience.
Lead by example - collaborating effectively, mentoring peers, and setting a high bar for craftsmanship.
Care deeply about reliability, performance, observability, and security in production systems.
Balance hands-on execution with long-term platform thinking, making sound trade-offs.
Notice to Applicants for Jobs Located in NYC or Remote Jobs Associated With Office in NYC Only