Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Red Hat Logo

Machine Learning Engineer, Distributed vLLM Inference

Red Hat

$133,650 - $220,680
Sep 12, 2025
Boston, MA, US
Apply Now

Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI deployments by providing a stable platform for enterprises to build, optimize, and scale LLM deployments.

Requirements

  • Strong proficiency in Python and at least one systems programming language (GoLang, Rust, or C++), with GoLang being highly preferred.
  • Experience with cloud-native Kubernetes service mesh technologies/stacks such as Istio, Cilium, Envoy (WASM filters), and CNI.
  • A solid understanding of Layer 7 networking, HTTP/2, gRPC, and the fundamentals of API gateways and reverse proxies.
  • Working knowledge of high-performance networking protocols and technologies including UCX, RoCE, InfiniBand, and RDMA is a plus.
  • Experience with the Kubernetes ecosystem, including core concepts, custom APIs, operators, and the Gateway API inference extension for GenAI workloads.
  • Experience with GPU performance benchmarking and profiling tools like NVIDIA Nsight or distributed tracing libraries/techniques like OpenTelemetry.

Responsibilities

  • Develop and maintain distributed inference infrastructure leveraging Kubernetes APIs, operators, and the Gateway Inference Extension API for scalable LLM deployments.
  • Create system components in Go and/or Rust to integrate with the vLLM project and manage distributed inference workloads.
  • Design and implement KV cache-aware routing and scoring algorithms to optimize memory utilization and request distribution in large-scale inference deployments.
  • Enhance the resource utilization, fault tolerance, and stability of the inference stack.
  • Contribute to the design, development, and testing of various inference optimization algorithms.
  • Actively participate in technical design discussions and propose innovative solutions to complex challenges.
  • Provide timely and constructive code reviews.

Other

  • Excellent communication skills, capable of interacting effectively with both technical and non-technical team members.
  • A Bachelor's or Master's degree in computer science, computer engineering, or a related field.
  • Ph.D. in an ML-related domain is a significant advantage
  • The salary range for this position is $133,650.00 - $220,680.00.
  • This position may also be eligible for bonus, commission, and/or equity.