Red Hat is on a mission to bring the power of open-source LLMs and vLLM to every enterprise, accelerating AI for the enterprise and bringing operational simplicity to GenAI deployments. The Machine Learning Engineer will help shape the future of AI deployment and utilization by contributing to a stable platform for enterprises to build, optimize, and scale LLM deployments.
Requirements
- Extensive experience in writing high performance modern C++ code.
- Strong experience with hardware acceleration libraries and backends: CUDA, Metal, Vulkan, or SYCL.
- Strong fundamentals in machine learning and deep learning, with a deep understanding of transformer architectures and LLM inference.
- Experience with performance profiling, benchmarking, and optimization techniques.
- Proficient in Python.
- Prior experience contributing to a major open-source project.
Responsibilities
- Design and implement new features and optimizations for the llama.cpp core, including model architecture support, quantization techniques, and inference algorithms.
- Optimize the codebase for various hardware backends, including CPU instruction sets, Apple Silicon (Metal), and other GPU technologies (CUDA, Vulkan, SYCL).
- Conduct performance analysis and benchmarking to identify bottlenecks and propose solutions for improving latency and throughput.
- Contribute to the design and evolution of core project components, such as the GGUF file format and the GGML tensor library.
- Collaborate with the open-source community by reviewing pull requests, participating in technical discussions on GitHub, and providing guidance on best practices.
Other
- If you are someone who wants to contribute to solving challenging technical problems at the forefront of deep learning in the open source way, this is the role for you.