Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

CRUSOE Logo

Staff Software Engineer, Managed Orchestration (Kubernetes)

CRUSOE

$204,000 - $247,000
Aug 21, 2025
San Francisco, CA, USA
Apply Now

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company and needs to develop its next-generation orchestration platform to power GPU-accelerated and high-performance computing at scale.

Requirements

  • 8+ years of software engineering experience in distributed systems, cloud, or HPC.
  • Proven track record of technical leadership and driving architecture in production systems.
  • Deep expertise in Kubernetes internals (control plane, operators, API machinery, scheduling).
  • Strong proficiency in Go (preferred) or another systems language (Rust, C++, Python for HPC tooling).
  • Extensive experience with GPU integration in Kubernetes (device plugins, GPU operators, resource allocation).
  • Strong knowledge of container networking (Cilium, Calico, Multus, service meshes) and Linux networking fundamentals.
  • Familiarity with high-performance networking technologies (InfiniBand, RoCE) and accelerator-aware scheduling.

Responsibilities

  • Lead architecture and design for core features of Crusoe’s Managed Kubernetes platform (multi-tenancy, control plane scalability, cluster lifecycle, and high availability).
  • Drive integration of GPU acceleration in Kubernetes, including device plugin architecture, GPU operators, scheduling, autoscaling, and monitoring.
  • Guide development of advanced container networking capabilities, including CNI plugins, network operators, service meshes, and high-performance fabrics (InfiniBand, RoCE).
  • Define and enforce best practices for security, multi-cluster deployments, and workload isolation across compute, GPU, and networking layers.
  • Partner with product and engineering leadership to set long-term technical strategy and roadmap for CMK.
  • Mentor engineers across the organization, providing technical guidance and elevating standards for design, code quality, and operational excellence.
  • Troubleshoot and resolve complex distributed systems challenges spanning compute, networking, and GPU acceleration.

Other

  • Ability to influence cross-functional teams to deliver reliable, scalable, and secure orchestration for mission-critical workloads.
  • Contribute to and represent Crusoe in open-source communities (Kubernetes SIGs, CNCF projects, GPU and networking ecosystem).
  • Familiarity with both NVIDIA and AMD GPU stacks (CUDA, ROCm, NCCL).
  • Experience with Slurm, MPI, Ray, or distributed ML frameworks (TensorFlow, PyTorch, JAX).
  • Contributions to open-source projects in the Kubernetes, GPU, or networking ecosystems.