Red Hat is looking to solve the problem of bringing the power of open-source LLMs and vLLM to every enterprise, and to accelerate AI for the enterprise by providing a stable platform for enterprises to build, optimize, and scale LLM deployments.
Requirements
- Extensive experience in writing high performance code for GPUs and deep knowledge of GPU hardware
- Strong understanding of computer architecture, parallel processing, and distributed computing concepts
- Experience with tensor math libraries such as PyTorch
- Modern C++, CUDA, Triton, and CUTLASS experience
- Mathematical software, especially linear algebra or signal processing
- Experience optimizing kernels for deep neural networks
- Experience with NVIDIA Nsight is a plus
Responsibilities
- Write robust Python and C++, working on vLLM systems, high performance machine learning primitives, performance analysis and modeling, and numerical methods
- Contribute to the design, development, and testing of various inference optimization algorithms
- Participate in technical design discussions and provide innovative solutions to complex problems
- Give thoughtful and prompt code reviews. Proactively utilizing AI-assisted development tools for code generation, auto-completion, and intelligent suggestions to accelerate development cycles and enhance code quality.
- Mentor and guide other engineers and foster a culture of continuous learning and innovation
Other
- Strong communications skills with both technical and non-technical team members
- BS, or MS in computer science or computer engineering or a related field. A PhD in a ML related domain is considered a plus
- Comprehensive medical, dental, and vision coverage
- Paid time off and holidays
- Paid parental leave plans for all new parents