Red Hat Inference team accelerates AI for the enterprise and brings operational simplicity to GenAI deployments. As leading developers, maintainers of the vLLM and llm-d project, and inventors of state-of-the-art techniques for model quantization and sparsification, our team provides a stable platform for enterprises to build, optimize, and scale LLM deployments.
Requirements
- Extensive experience in writing high performance modern C++ code.
- Strong experience with hardware acceleration libraries and backends: CUDA, Metal, Vulkan, or SYCL.
- Strong fundamentals in machine learning and deep learning, with a deep understanding of transformer architectures and LLM inference.
- Experience with performance profiling, benchmarking, and optimization techniques.
- Proficient in Python.
- Prior experience contributing to a major open-source project.
Responsibilities
- Design and implement new features and optimizations for the llama.cpp core, including model architecture support, quantization techniques, and inference algorithms.
- Optimize the codebase for various hardware backends, including CPU instruction sets, Apple Silicon (Metal), and other GPU technologies (CUDA, Vulkan, SYCL).
- Conduct performance analysis and benchmarking to identify bottlenecks and propose solutions for improving latency and throughput.
- Contribute to the design and evolution of core project components, such as the GGUF file format and the GGML tensor library.
- Collaborate with the open-source community by reviewing pull requests, participating in technical discussions on GitHub, and providing guidance on best practices.
Other
- If you are someone who wants to contribute to solving challenging technical problems at the forefront of deep learning in the open source way, this is the role for you.
- The salary range for this position is $133,650.00 - $281,770.00.
- Actual offer will be based on your qualifications.
- This position may also be eligible for bonus, commission, and/or equity.
- Red Hatters are encouraged to bring their best ideas, no matter their title or tenure.