AMD is looking to enable and optimize the software ecosystem for the next generation of GPU computational accelerators to drive the data center, artificial intelligence, PCs, gaming, and embedded markets.
Requirements
- Strong programming skills in C++ and Python
- Strong development experience is at least one major DL framework such as Pytorch or Tensorflow in inference, fine tuning and/or training
- Experience developing software and system-level performance optimizations with a solid architecture understanding in GPUs a plus
- Experience with open-source software development including collaboration with community maintainers and submitting contributions is a plus
- Expertise in profiling tools across the AI SW Stack (Torchprofiler, RocM profiler, Vtune, Nsight)
- Experience in implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI)
- Performance analysis skills for both CPU and GPU
Responsibilities
- enable DL models, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments
- analyzing and optimizing the performance of AI software
- understand hardware bottlenecks and harness performance to hit close to roofline
- developing software and system-level performance optimizations with a solid architecture understanding in GPUs
- open-source software development including collaboration with community maintainers and submitting contributions
- root-causing/addressing performance issues
- implementing and optimizing parallel methods on GPU accelerators (NCCL/RCCL, OpenMP, MPI)
Other
- Minimum 7 years of experience required.
- Must be self-motivated and possess the ability to work well within a team environment.
- Ability to work independently and as part of a team.
- Willingness to learn skills, tools, and methods to advance the quality, consistency, and timeliness of AMD software products.
- Experience providing clear and timely communication related to status and other key aspects of the project to leadership team.