The company is looking to transform how enterprises harness AI by building a tight-knit group of AI and infrastructure experts to rethink systems from the ground up and deliver breakthrough solutions that redefine what's possible — faster, leaner, and smarter.
Requirements
- 10+ years of experience in systems programming, ideally with 5+ years focused on CUDA/GPU driver and runtime internals.
- Minimum of 5+ years of experience with kernel-space development, ideally in Linux kernel modules, device drivers, or GPU runtime libraries (e.g., CUDA, ROCm, or OpenCL runtimes).
- Experience working with NVIDIA GPU architecture, CUDA toolchains, and performance tools (Nsight, CUPTI, etc.).
- Experience optimizing for NVLink, PCIe, Unified Memory (UM), and NUMA architectures.
- Strong grasp of RDMA, InfiniBand, and GPUDirect technologies and their using in frameworks like UCX.
- Minimum of 8+ years of experience programming within C/C++ with low-level systems proficiency (memory management, synchronization, cache coherence).
- Deep understanding of HPC workloads, performance bottlenecks, and compute/memory tradeoffs.
Responsibilities
- Design, develop, and maintain device drivers and runtime components for GPU and network components of the systems.
- Working with kernel and platform components to build efficient memory management paths using pinned memory, peer-to-peer transfers, and unified memory.
- Optimize data movement using high-speed interconnects such as RDMA, InfiniBand, NVLink, and PCIe, with a focus on reducing latency and increasing bandwidth.
- Implement and fine-tune GPU memory copy paths with awareness of NUMA topologies and hardware coherency.
- Develop instrumentation and telemetry collection mechanisms to monitor GPU and memory performance without impacting runtime workloads.
- Contribute to internal tools and libraries for GPU system introspection, profiling, and debugging.
- Provide technical mentorship and peer reviews, and guide junior engineers on best practices for low-level GPU development.
Other
- This position requires a hybrid working schedule in the San Jose or Milpitas office.
- Bachelor' degree in STEM related field
- PhD is a plus, especially with research in GPU systems, compilers, or HPC.