Penguin Solutions is looking to enhance its ICE Software products, which are used in the deployment, provisioning, management, and monitoring of large computational systems. The company needs a Network Architect to bridge the gap between complex network infrastructure requirements and software development initiatives for AI and High-Performance Computing (HPC) Linux-based clusters.
Requirements
- Proven experience with high-performance networking protocols including InfiniBand, highspeed Ethernet, and RDMA technologies
- Strong background in Linux networking stack, including kernel networking, routing, and network interface management
- Experience with cluster networking and distributed computing environments
- Deep understanding of TCP/IP, BGP, OSPF, EVPN/VXLAN, and other advanced networking protocols
- Knowledge of HPC interconnect technologies (InfiniBand, Omni-Path, high-speed Ethernet) and their performance characteristics
- Experience with network automation tools and Infrastructure as Code (Ansible, Terraform, Netconf)
- Understanding of software-defined networking (SDN) concepts and implementation
Responsibilities
- Design and implement high-performance network architectures for AI and HPC clusters, including InfiniBand, high-speed Ethernet (100/200/400GbE), and RDMA-based solutions
- Architect scalable network topologies optimized for low-latency, high-bandwidth cluster computing environments
- Develop network segmentation strategies and implement VLANs, VRFs, and ACLs for multitenant cluster environments
- Collaborate with software engineering teams to integrate networking protocols and services into cluster management tools
- Optimize network performance for distributed computing workloads, including MPI, NCCL, and collective communication operation
- Build scripts/tools for network configuration management, telemetry, and compliance using Python/Ansible
- Develop network monitoring solutions and performance metrics for cluster health assessment
Other
- Minimum 5-7 years of experience in network engineering with focus on HPC or data center environments
- Strong technical communication skills with ability to explain complex networking concepts to software engineers
- Collaborative mindset with experience working in cross-functional engineering teams
- Problem-solving abilities and systematic approach to troubleshooting complex technical issues
- Self-motivated with ability to work independently while maintaining team alignment