Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Network Engineer, AI/ML Infrastructure

Boson AI

Salary not specified

Nov 5, 2025

Santa Clara, CA, United States of America

Design, build, and optimize the high-performance networking infrastructure powering AI/ML operations in Toronto, managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, and hundreds of servers.

Requirements

Hands-on experience with high-speed networking (100Gb+ Ethernet and InfiniBand)
Hands-on experience with network security (firewalls, ACLs, network segmentation)
Experience with InfiniBand fabrics including RDMA, RoCE, IPoIB
Strong understanding of L2/L3 networking protocols (TCP/IP, BGP, OSPF, VLANs)
Experience optimizing networks for GPU-to-GPU communication
Experience with network automation tools
Familiarity with network monitoring and observability tools (Prometheus, Grafana)

Responsibilities

Configure and maintain InfiniBand and high-speed Ethernet fabrics
Optimize network performance for RDMA, and GPU-to-GPU communication
Manage network switches (Mellanox, NVIDIA, Micas Networks)
Troubleshoot network bottlenecks and latency issues
Plan and execute network upgrades and expansions
Network security implementation (firewalls, VLANs, ACLs)
Infrastructure monitoring

Other

4+ years of network engineering experience in production environments
Knowledge of HPC network topologies
Strong troubleshooting and problem-solving skills
Experience in data center environments or AI/ML infrastructure
If you're a natural problem-solver with a passion for continuous learning, we'd love to hear from you.