Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

AMD Logo

Sr. Staff Software Engineer - GPU Network Software, RCCL

AMD

Salary not specified
Sep 26, 2025
Santa Clara, CA, US
Apply Now

AMD is looking to develop multi-node GPU communication libraries to enable high-performance computing and machine learning workloads at Exascale.

Requirements

  • Strong background developing applications and libraries in C, C++, and Python
  • Experience working with RoCE(RDMA over Converge Ethernet), Libfabric and InfiniBand
  • Experience working with Linux Kerner, Device drivers and network drivers.
  • Experience designing and building GPU Networks for Large Scale Clusters
  • Experience in collective communication libraries: MPI, RCCL, SHMEM and optimization to scale collective communication to scale distributed systems.
  • In-depth knowledge of best-practices in software development, including testing, profiling, debugging, documentation, version control, issue tracking, and planning
  • GPU software development using HIP, CUDA, or OpenCL

Responsibilities

  • Support AMD’s RCCL, an open source, GPU-accelerated communication collective middleware and related technologies
  • Design, implement, and test networking features for multi-GPU and multi-node communication libraries.
  • Benchmark, profile and optimize code to maximize throughput on single-GPU, multi-GPU and clustered systems
  • Deliver high-quality code and documentation following best practices for open source software development
  • Work with key technical experts across AMD and with our partners and customers to improve ROCm applications, libraries, and tools
  • Deploy the libraries on large clusters and debug complex system level issues that could span across different layers of the software stack: gpu kernel drivers, nic driver etc.

Other

  • Accustomed to working in a dynamic, geographically distributed agile team, where partnership and collaboration are paramount.
  • Possess excellent written and verbal communication skills, strong attention to detail, and the ability to express your work in a clear, cohesive fashion.
  • Results-oriented and accustomed to tight deadlines and changing priorities.
  • Constantly thinking of ways to improve performance of software and hardware.
  • B.Sc. or B.Eng. degree in Computer Science, Software Engineering, Electrical Engineering, or equivalent