Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Software Engineer, Managed Orchestration (Kubernetes)

CRUSOE

$204,000 - $247,000

Aug 21, 2025

San Francisco, CA, USA

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company and needs to develop its next-generation orchestration platform to power GPU-accelerated and high-performance computing at scale.

Requirements

8+ years of software engineering experience in distributed systems, cloud, or HPC.
Proven track record of technical leadership and driving architecture in production systems.
Deep expertise in Kubernetes internals (control plane, operators, API machinery, scheduling).
Strong proficiency in Go (preferred) or another systems language (Rust, C++, Python for HPC tooling).
Extensive experience with GPU integration in Kubernetes (device plugins, GPU operators, resource allocation).
Strong knowledge of container networking (Cilium, Calico, Multus, service meshes) and Linux networking fundamentals.
Familiarity with high-performance networking technologies (InfiniBand, RoCE) and accelerator-aware scheduling.

Responsibilities

Lead architecture and design for core features of Crusoe’s Managed Kubernetes platform (multi-tenancy, control plane scalability, cluster lifecycle, and high availability).
Drive integration of GPU acceleration in Kubernetes, including device plugin architecture, GPU operators, scheduling, autoscaling, and monitoring.
Guide development of advanced container networking capabilities, including CNI plugins, network operators, service meshes, and high-performance fabrics (InfiniBand, RoCE).
Define and enforce best practices for security, multi-cluster deployments, and workload isolation across compute, GPU, and networking layers.
Partner with product and engineering leadership to set long-term technical strategy and roadmap for CMK.
Mentor engineers across the organization, providing technical guidance and elevating standards for design, code quality, and operational excellence.
Troubleshoot and resolve complex distributed systems challenges spanning compute, networking, and GPU acceleration.

Other

Ability to influence cross-functional teams to deliver reliable, scalable, and secure orchestration for mission-critical workloads.
Contribute to and represent Crusoe in open-source communities (Kubernetes SIGs, CNCF projects, GPU and networking ecosystem).
Familiarity with both NVIDIA and AMD GPU stacks (CUDA, ROCm, NCCL).
Experience with Slurm, MPI, Ray, or distributed ML frameworks (TensorFlow, PyTorch, JAX).
Contributions to open-source projects in the Kubernetes, GPU, or networking ecosystems.