Red Hat is looking to solve the problem of bringing the power of open-source LLMs and vLLM to every enterprise, and to accelerate AI for the enterprise by providing a stable platform for enterprises to build, optimize, and scale LLM deployments.
Requirements
- Strong experience in Python and Pydantic
- Strong understanding of LLM Inference Core Concepts, such as logits processing (ie. Logit Generation -> Sampling -> Decoding loop)
- Deep familiarity with the OpenAI Chat Completions API specification
- Deep familiarity with libraries like Outlines, XGrammar, Guidance, or Llama.cpp grammars
- Proficiency with efficient parsing techniques (e.g., incremental parsing) is a strong plus
- Proficiency with Jinja2 chat templates
- Familiarity with Beam Search and Greedy Decoding in the context of constraints
Responsibilities
- Write robust Python and Pydantic, working on vLLM systems, high performance machine learning primitives, performance analysis and modeling, and numerical methods
- Contribute to the design, development, and testing of function calling, tool calling parser, and structured output subsystems in vLLM
- Participate in technical design discussions and provide innovative solutions to complex problems
- Give thoughtful and prompt code reviews
- Mentor and guide other engineers and foster a culture of continuous learning and innovation
- Build and maintain subsystems that allow vLLM to speak the language of tools
- Bridge the gap between probabilistic token generation and deterministic schema compliance
Other
- BS, or MS in computer science or computer engineering, mathematics, or a related field; PhD in an ML-related domain is considered a plus
- Strong communication skills with both technical and non-technical team members
- Comprehensive medical, dental, and vision coverage
- Flexible Spending Account - healthcare and dependent care
- Paid time off and holidays