Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

NVIDIA Logo

Principal Software Engineer – Large-Scale LLM Memory and Storage Systems

NVIDIA

$272,000 - $425,500
Dec 22, 2025
Santa Clara, CA, US
Apply Now

NVIDIA is seeking to define the vision and roadmap for memory management of large-scale LLM and storage systems to enable efficient, resilient deployment of cutting-edge LLM workloads

Requirements

  • 15+ years of experience building large-scale distributed systems, high-performance storage, or ML systems infrastructure in C/C++ and Python
  • Deep understanding of memory hierarchies (GPU HBM, host DRAM, SSD, and remote/object storage)
  • Distributed caching or key-value systems, especially designs optimized for low latency and high concurrency
  • Hands-on experience with networked I/O and RDMA/NVMe-oF/NVLink-style technologies
  • Familiarity with concepts like disaggregated and aggregated deployments for AI clusters
  • Strong skills in profiling and optimizing systems across CPU, GPU, memory, and network
  • Experience with Rust and Python

Responsibilities

  • Design and evolve a unified memory layer that spans GPU memory, pinned host memory, RDMA-accessible memory, SSD tiers, and remote file/object/cloud storage to support large-scale LLM inference
  • Architect and implement deep integrations with leading LLM serving engines (such as vLLM, SGLang, TensorRT-LLM), with a focus on KV-cache offload, reuse, and remote sharing across heterogeneous and disaggregated clusters
  • Co-design interfaces and protocols that enable disaggregated prefill, peer-to-peer KV-cache sharing, and multi-tier KV-cache storage (GPU, CPU, local disk, and remote memory) for high-throughput, low-latency inference
  • Partner closely with GPU architecture, networking, and platform teams to exploit GPUDirect, RDMA, NVLink, and similar technologies for low-latency KV-cache access and sharing across heterogeneous accelerators and memory pools
  • Mentor senior and junior engineers, set technical direction for memory and storage subsystems, and represent the team in internal reviews and external forums (open source, conferences, and customer-facing technical deep dives)
  • Design systems that span multiple tiers for performance and cost efficiency
  • Profile and optimize systems across CPU, GPU, memory, and network, using metrics to drive architectural decisions and validate improvements in TTFT and throughput

Other

  • Masters or PhD or equivalent experience
  • Excellent communication skills and prior experience leading cross-functional efforts with research, product, and customer teams
  • Ability to work in a diverse and inclusive environment
  • Willingness to participate in internal reviews and external forums (open source, conferences, and customer-facing technical deep dives)
  • Ability to mentor senior and junior engineers and set technical direction for memory and storage subsystems