AMD is looking to optimize and develop deep learning frameworks for AMD GPUs to enhance GPU kernels, deep learning models, and training/inference performance across multi-GPU and multi-node systems.
Requirements
- GPU Kernel Development & Optimization: Experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM).
- Deep Learning Integration: Experienced in integrating optimized GPU performance into machine learning frameworks (e.g., TensorFlow, PyTorch) to accelerate model training and inference.
- Software Engineering: Skilled in Python and C++, with experience in debugging, performance tuning, and test design to ensure high-quality, maintainable software solutions.
- High-Performance Computing: Solid experienced in running large-scale workloads on heterogeneous compute clusters, optimizing for efficiency and scalability.
- Compiler Optimization: Foundational understanding of compiler theory and tools like LLVM and ROCm for kernel and system performance optimization.
- Knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations.
- Experience with tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
Responsibilities
- Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories.
- Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations.
- Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance.
- Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
- Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.
- Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems.
- Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance.
Other
- Bachelor’s and/or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- Strong problem-solving skills, a proactive approach, and a keen understanding of software engineering best practices are essential.