Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Red Hat Logo

Machine Learning Engineer, Distributed vLLM Inference

Red Hat

$133,650 - $220,680
Sep 12, 2025
Boston, MA, US
Apply Now

Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI deployments by bringing the power of open-source LLMs and vLLM to every enterprise.

Requirements

  • Strong proficiency in Python and at least one systems programming language (GoLang, Rust, or C++), with GoLang being highly preferred.
  • Experience with cloud-native Kubernetes service mesh technologies/stacks such as Istio, Cilium, Envoy (WASM filters), and CNI.
  • A solid understanding of Layer 7 networking, HTTP/2, gRPC, and the fundamentals of API gateways and reverse proxies.
  • Working knowledge of high-performance networking protocols and technologies including UCX, RoCE, InfiniBand, and RDMA is a plus.
  • Experience with the Kubernetes ecosystem, including core concepts, custom APIs, operators, and the Gateway API inference extension for GenAI workloads.
  • Experience with GPU performance benchmarking and profiling tools like NVIDIA Nsight or distributed tracing libraries/techniques like OpenTelemetry.

Responsibilities

  • Develop and maintain distributed inference infrastructure leveraging Kubernetes APIs, operators, and the Gateway Inference Extension API for scalable LLM deployments.
  • Create system components in Go and/or Rust to integrate with the vLLM project and manage distributed inference workloads.
  • Design and implement KV cache-aware routing and scoring algorithms to optimize memory utilization and request distribution in large-scale inference deployments.
  • Enhance the resource utilization, fault tolerance, and stability of the inference stack.
  • Contribute to the design, development, and testing of various inference optimization algorithms.
  • Actively participate in technical design discussions and propose innovative solutions to complex challenges.
  • Provide timely and constructive code reviews.

Other

  • Excellent communication skills, capable of interacting effectively with both technical and non-technical team members.
  • A Bachelor's or Master's degree in computer science, computer engineering, or a related field.
  • Ph.D. in an ML-related domain is a significant advantage