Transform how enterprises harness AI by rethinking systems from the ground up and delivering breakthrough solutions that redefine what's possible — faster, leaner, and smarter.
Requirements
Experience building scalable, production-grade infrastructure components or control planes using Go, Python, and C++.
Experience with Kubernetes, Docker or Kubevirt for virtualization, containerization, and orchestration frameworks
Experience designing or implementing logical resource abstractions for compute, storage, or networking with a focus in multi-tenant environments.
Experience integrating with AI/ML platforms or pipelines (e.g., PyTorch, TensorFlow, Triton Inference Server, MLFlow).
Experience with GPU sharing, scheduling, or isolation techniques (e.g., MPS, MIG, time-slicing, device plugin frameworks, or vGPU technologies).
Solid grasp of resource management concepts including quotas, fairness, prioritization, and elasticity.
Responsibilities
Design and implement infrastructure abstractions that cleanly separate logical compute units (vGPUs, GPU pods, AI queues) from physical hardware (nodes, devices, interconnects).
Develop runtime services, APIs, and control planes to expose GPU and accelerator resources to users and frameworks with multi-tenant isolation and QoS guarantees.
Architect systems for secure GPU sharing, including time-slicing, memory partitioning, and namespace isolation across tenants or jobs.
Collaborate with platform, orchestration, and scheduling teams to map logical resources to physical devices based on utilization, priority, and topology.
Define and enforce resource usage policies, including fair sharing, quota management, and oversubscription strategies.
Integrate with model training and serving frameworks (e.g., PyTorch, TensorFlow, Triton) to ensure smooth and predictable resource consumption.
Build observability and telemetry pipelines to trace logical-to-physical mappings, usage patterns, and performance anomalies.
Other
This position requires a hybrid working schedule in the San Jose or Milpitas office.
Bachelors + 15 years of related experience, or Masters + 12 years of related experience, or PhD + 8 years of related experience