Optimize AI workloads running on AMD GPUs and define the future of AI computing solutions.
Requirements
- Hands-on experience with state-of-the-art LLM inference and training at both the framework and kernel levels is highly preferred.
- Deep AI infrastructure experience with open-source frameworks (e.g., SGLang, vLLM, Jax, XLA, Pytorch, Triton).
- Strong kernel optimization skills using DSLs and HIP (or CUDA), plus PTX/SASS equivalents.
- Hands-on knowledge of modern GPU architecture.
- Demonstrated open-source contributions on GitHub.
Responsibilities
- Improve AMD GPU performance on open-source repositories for LLM workloads such as vLLM and SGLang.
- Co-optimize AI workloads on current AMD GPUs by analyzing the bottlenecks and mitigating them at the kernel level.
- Integrate AMD software stacks (RoCm, ATen) into open-source frameworks such as Pytorch, JAX, and Triton.
- Build strong technical relationships with peers and partners, and report learnings and gaps to GPU software and hardware engineers.
Other
- Ideal candidates should possess a "Just-Do-It" mindset and strong motivation.
- They must be driven to understand the status quo and explore better solutions using the first principles of thinking.
- Motivational leadership and excellent interpersonal skills.
- Bachelor's, Master's, or Ph.D. in Computer Engineering, Computer Science, Electrical Engineering, or a related technical field
- LI-HYBRID