Red Hat is looking to accelerate AI for the enterprise and bring operational simplicity to GenAI deployments by developing a stable platform for enterprises to build, optimize, and scale LLM deployments, leveraging open-source LLMs and vLLM.
Requirements
- Extensive experience in writing high performance code for GPUs and deep knowledge of GPU hardware
- Strong understanding of computer architecture, parallel processing, and distributed computing concepts
- Experience with tensor math libraries such as PyTorch
- Deep understanding and experience in GPU performance optimizations
- Experience optimizing kernels for deep neural networks
- Experience with NVIDIA Nsight is a plus
Responsibilities
- Write robust Python and C++, working on vLLM systems, high performance machine learning primitives, performance analysis and modeling, and numerical methods.
- Contribute to the design, development, and testing of various inference optimization algorithms
- Participate in technical design discussions and provide innovative solutions to complex problems
- Give thoughtful and prompt code reviews
- Mentor and guide other engineers and foster a culture of continuous learning and innovation
Other
- Strong communications skills with both technical and non-technical team members
- BS, or MS in computer science or computer engineering or a related field. A PhD in a ML related domain is considered a plus