Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

DigitalOcean Logo

Principal Engineer - Inference Service

DigitalOcean

$206,000 - $250,000
Sep 30, 2025
San Francisco, CA, USA
Apply Now

DigitalOcean is looking to solve the problem of simplifying LLM hosting, serving, and optimization for millions of users by building a new product that brings their famed DigitalOcean Simplicity to the world of LLM inference services.

Requirements

  • 10+ years of experience in software engineering, which should include 2+ years building AI/ML technologies (ideally related to LLM hosting and inference).
  • Enduring interest in distributed systems design, AI/ML, and implementation at scale in the cloud.
  • Deep expertise in cloud computing platforms and modern AI/ML technologies.
  • Experience with modern LLMs, ideally related to hosting, serving, and optimizing such models.
  • Experience with one or more inference engines would be a bonus: vLLM, SGLang, Modular Max etc.
  • Experience researching, evaluating, and building with open source technologies.
  • Proficiency in programming languages commonly used in cloud development, such as Python and Go.

Responsibilities

  • Design and implement an inference platform for serving large language models optimized for the various GPU platforms they will be run on.
  • Develop and shepherd complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations.
  • Optimize runtime and infrastructure layers of the inference stack for best model performance.
  • Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures.
  • Contribute to open source inference engines to make them perform better on DigitalOcean cloud.
  • Build tooling and observability to monitor system health, and build auto tuning capabilities.
  • Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts.

Other

  • A strong sense of ownership and a drive to figure out and resolve any issues preventing you and your team from delivering value to your customers
  • An appreciation for process and developing cross-disciplinary collaboration between engineering, operations, support, and product groups
  • Familiarity with end-to-end quality best practices and their implementation
  • Experience coordinating with partner teams across time zones and geographies
  • Experience with infrastructure as code (IaC) tools like Terraform or Ansible