AMD is looking to tackle one of the most exciting challenges in the industry: training and running AI to make AI itself more efficient on GPUs on the fly, which can dramatically alter the trajectory of AI progress.
Requirements
- high-performance C++ software engineering and low-level GPU programming with a robust understanding of Large Language Models (LLMs) and AI systems.
- bridge kernel engineering with AI post-training (RL) experience.
- designing complex, scalable systems using modern C++
- fundamental grasp of GPU architectures (HIP/CUDA), memory hierarchies, and kernel optimization to maximize hardware performance.
- significant hands-on experience in large-scale C++/HIP/CUDA projects, such as contributing to the ROCm ecosystem (e.g., rocBLAS, hipDNN, Composable Kernel, AITemplate), CUDA libraries (e.g., cuBLAS, cuDNN, CUTLASS, Thrust, CUB, NCCL), or the C++/HIP/CUDA core of ML frameworks like PyTorch, TensorFlow, or JAX.
- deep understanding of LLMs, including but not limited to transformer architectures, attention mechanisms, and the full model lifecycle, with hands-on experience in advanced model alignment and post-training techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning (e.g., RLHF, GRPO).
- familiarity with cutting-edge trends such as Mixture-of-Experts (MoE) architectures, inference optimizations (e.g., quantization, speculative decoding), and modern application patterns like Agentic AI systems (e.g. AlphaEvolve for code/kernel generation).
Responsibilities
- Architect and Drive the AI Software Stack: You will establish best practices and optimize performance from the lowest-level GPU kernels to large-scale distributed systems, shaping the foundational software for AMD hardware.
- By leveraging cutting-edge Large Language Models (LLMs) and agent-based technologies, you will accelerate the development and performance enhancement of the AMD ROCm ecosystem, ensuring it remains at the forefront of AI innovation.
- Accelerate Foundational Models: Your work will directly accelerate cutting-edge applications like foundation models (LLMs) and autonomous AI agents, ensuring AMD is the platform of choice for the most demanding workloads.
- Innovate Across Hardware and Software: You will contribute to the entire co-design lifecycle, from influencing future GPU architectures to developing groundbreaking software for new accelerators and collaborating with the broader AI community.
- As a senior engineer, you will also be expected to mentor others and effectively communicate your ideas to shape the future of AI at AMD.
- demonstrating mastery in designing complex, scalable systems using modern C++, coupled with a fundamental grasp of GPU architectures (HIP/CUDA), memory hierarchies, and kernel optimization to maximize hardware performance.
- deep understanding of LLMs, including but not limited to transformer architectures, attention mechanisms, and the full model lifecycle, with hands-on experience in advanced model alignment and post-training techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning (e.g., RLHF, GRPO).
Other
- strong technical ownership, communication, and problem-solving skills with a track record of delivering complex technical solutions.
- Success in this role requires a deep passion for software engineering, strong technical ownership to see complex problems through to resolution, and the ability to influence technical direction across teams.
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent.
- Master's degree preferred, PhD is a plus.
- Relevant publications in AI/ML, GPU computing, or system optimization are highly valued.