The client's global data center networks require development, deployment, and operation to support a rapidly evolving infrastructure. The DC Networking team needs Software Engineers to build and scale these networks, addressing challenges across the entire network lifecycle.
Requirements
- 7+ years of experience in C/C++ and Python
- 7+ years experience in Systems programming, TCP/IP, HTTP/HTTPS, SPDY, DNS, and load balancers
- Experience with network devices (routers, switches, load balancers) and an understanding of network routing protocols
- Experience with Linux Kernel, especially drivers and network stack
- Working knowledge of transport stack particularly RDMA (RoCEv2)
- Experience with Qemu, FPGA Emulation environment is a plus
- Platform services (program, control, and monitor Optics, PHY, FPGAs, sensors, fan control, power etc), BSP/Board Support Package, Operating Systems, Kernel, Bootloader, Power Management, RTOS, Linux
Responsibilities
- Design and implement drivers (and/or Firmware) for (network) ethernet adapter functions, Transport stack for RDMA, control functions with the host/accelerators.
- Design and implement Platform services such as programming, monitoring, and controlling system components (Optics, PHY, FPGAs, sensors, fan control, power etc).
- Develop and enhance HPC collective communication and parallel computing libraries such as NCCL, RCCL, OneCCL, and MPI
- Debug complex, system-level, multi-component issues that typically span across multiple layers from Kernel, and user-mode applications.
Other
- 100% remote; PST hours
- 6 months; potential to extend
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment