At Red Hat, the business problem is to bring the power of open-source LLMs and vLLM to every enterprise by optimizing AI model efficiency.
Requirements
- Strong programming skills in Python, with experience in deep learning frameworks such as PyTorch or TensorFlow.
- Familiarity with AI model optimization techniques such as quantization (e.g., INT4, FP8), pruning, and knowledge distillation.
- Background in efficient inference techniques for large-scale language models or computer vision models.
- Prior experience contributing to open-source ML frameworks or research publications.
Responsibilities
- Research and implement techniques for model compression, quantization, and optimization.
- Conduct experiments to evaluate the impact of optimization methods on model accuracy, latency, and throughput.
- Collaborate with researchers and engineers to integrate optimizations into real-world machine learning workflows.
- Document findings and contribute to technical reports, blog posts, or research publications.
Other
- Currently pursuing a Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, or a related field.
- Excellent communication skills and ability to work in a team-oriented research environment.
- Strong analytical and problem-solving skills.