Red Hat is looking to advance open-source large language models (LLMs) and virtual LLM (vLLM) technologies, driving the development of scalable, efficient, and high-performance AI deployment solutions for enterprise customers.
Requirements
- Extensive experience in developing high-performance code for GPUs
- Deep knowledge of GPU hardware architecture and performance optimization techniques
- Strong understanding of computer architecture, parallel processing, and distributed computing
- Proficiency with tensor math libraries such as PyTorch
- Experience in optimizing kernels for deep neural networks
- Knowledge of high-performance networking protocols including UCX, RoCE, InfiniBand, and RDM
Responsibilities
- Develop and maintain robust Python and C++ codebases for vLLM systems, focusing on high-performance machine learning primitives
- Design, implement, and test inference optimization algorithms to enhance model efficiency and scalability
- Conduct performance analysis and modeling to identify bottlenecks and improve system throughput
- Participate in technical discussions to propose innovative solutions for complex problems in model deployment
- Review code thoroughly and provide constructive feedback to team members
- Mentor junior engineers, promoting a culture of continuous learning and technical excellence
- Collaborate with cross-functional teams to integrate new features and improve existing infrastructure
Other
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field; PhD highly preferred
- Excellent communication skills for collaboration with technical and non-technical teams