Red Hat is looking to solve the problem of bringing the power of open-source LLMs and vLLM to every enterprise by accelerating AI for the enterprise and bringing operational simplicity to GenAI deployments.
Requirements
- Extensive experience in writing high performance modern C++ code.
- Strong experience with hardware acceleration libraries and backends: CUDA, Metal, Vulkan, or SYCL.
- Strong fundamentals in machine learning and deep learning, with a deep understanding of transformer architectures and LLM inference.
- Experience with performance profiling, benchmarking, and optimization techniques.
- Proficient in Python.
- Prior experience contributing to a major open-source project.
Responsibilities
- Design and implement new features and optimizations for the llama.cpp core, including model architecture support, quantization techniques, and inference algorithms.
- Optimize the codebase for various hardware backends, including CPU instruction sets, Apple Silicon (Metal), and other GPU technologies (CUDA, Vulkan, SYCL).
- Conduct performance analysis and benchmarking to identify bottlenecks and propose solutions for improving latency and throughput.
- Contribute to the design and evolution of core project components, such as the GGUF file format and the GGML tensor library.
- Collaborate with the open-source community by reviewing pull requests, participating in technical discussions on GitHub, and providing guidance on best practices.
Other
- Bachelor's, Master's, or Ph.D. degree in Computer Science or related field (not explicitly mentioned but implied)
- Comprehensive medical, dental, and vision coverage
- Flexible Spending Account - healthcare and dependent care
- Retirement 401(k) with employer match
- Paid time off and holidays