The company is looking to solve the problem of designing, building, and optimizing high-performance networking infrastructure to power AI/ML operations in Toronto.
Requirements
- Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs)
- Hands-on experience with high-speed networking (100Gb+ Ethernet and InfiniBand)
- Hands-on experience with network security (firewalls, ACLs, network segmentation)
- Knowledge of HPC network topologies
- Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB
- Experience with open-source firewall solutions (OPNsense, pfSense, or similar)
- Experience with network automation tools
Responsibilities
- Configure and maintain InfiniBand and high-speed Ethernet fabrics
- Optimize network performance for RDMA, and GPU-to-GPU communication
- Manage network switches (Mellanox, NVIDIA, Micas Networks)
- Troubleshoot network bottlenecks and latency issues
- Plan and execute network upgrades and expansions
- Network security implementation (firewalls, VLANs, ACLs)
- Collaborate on storage network optimization
Other
- 4+ years of network engineering experience in production environments
- Strong troubleshooting and problem-solving skills
- Natural problem-solver with a passion for continuous learning