Red Hat is seeking to advance its open-source AI platform by developing high-performance inference systems and optimizing large language model deployments.
Requirements
- Experience with high-performance GPU programming and hardware architecture
- Proficiency in C++, CUDA, and Triton
- Strong knowledge of tensor math libraries such as PyTorch
- Experience with kernel optimization for deep neural networks
- Familiarity with NVIDIA Nsight or similar profiling tools
- Understanding of computer architecture, parallel processing, and distributed computing
- Background in linear algebra, signal processing, or related mathematical software
Responsibilities
- Develop and maintain high-performance Python and C++ codebases for vLLM systems and ML primitives
- Design, implement, and test inference optimization algorithms to enhance model efficiency and performance
- Conduct performance analysis and modeling to identify bottlenecks and optimize computational workflows
- Participate in technical design discussions, providing innovative solutions for complex problems
- Perform code reviews, ensuring code quality, robustness, and adherence to best practices
- Utilize AI-assisted development tools for code generation, auto-completion, and intelligent suggestions to improve productivity
- Collaborate with cross-functional teams to integrate new features and improve existing systems
Other
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, or a related field
- Excellent communication skills for collaboration with technical and non-technical teams
- PhD in a machine learning-related domain is a plus