Persimmons is building the infrastructure that will power the next decade of AI, enabling smarter devices, more sustainable data centers, and entirely new applications.
Requirements
- 7+ years of software development experience in developing high performance software for HPC systems, distributed systems or hardware accelerators
- Deep knowledge of collective communication algorithms and primitives (NCCL, ROCm, MPI)
- Knowledge of hardware architectures and their optimization implications, including memory hierarchies, high speed interconnects, DMA engines, and multi core parallel processing.
- strong C/C++ skills
Responsibilities
- Design the communication protocols for device discovery, routing and efficient dataflow for AI workloads running across distributed hardware.
- Develop scalable communication software architected to efficiently coordinate across thousands of compute nodes in large-scale AI clusters
- Define and Implement low level communication primitives for inter device data transfer using advanced high speed interconnect protocols.
- Implement high speed data transfers using DMA and efficient memory management.
- Collaborate with cross-functional teams to design, test, and optimize our hardware and software solutions.
- Analyze and improve the efficiency, scalability, and performance of our systems.
- Stay abreast of industry trends and advancements to ensure our solutions remain competitive and innovative.
Other
- Provide technical leadership across the software team, mentoring engineers, and help scale the team as the company grows.
- BS/MS/PhD degree in Computer Science, Computer Engineering, or related field (or equivalent experience)
- Strong interpersonal, verbal and written communications skills
- Capability to achieve objectives under tight deadlines
- Experience executing tasks while managing competing priorities