Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

CRUSOE Logo

Senior Software Engineer, Managed Orchestration (Kubernetes)

CRUSOE

$166,000 - $204,000
Aug 21, 2025
San Francisco, CA, USA
Apply Now

Crusoe is building the World’s Favorite AI-first Cloud infrastructure company, pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications. Crusoe is redefining AI cloud infrastructure, with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold standard" for reliability and performance. Our data centers are optimized for AI workloads and are powered by clean, renewable energy.

Requirements

  • 5+ years of software engineering experience in distributed systems, cloud, or infrastructure.
  • Deep understanding of Kubernetes internals (control plane, scheduling, operators, controllers, API machinery).
  • Strong proficiency in Go (preferred) or similar languages (Rust, C++, Python for systems work).
  • Experience with container networking (CNI plugins, service mesh, load balancing) and Linux networking fundamentals.
  • Exposure to GPU workloads in Kubernetes (device plugins, GPU operators, scheduling, autoscaling).
  • Familiarity with cloud platforms (AWS, GCP, or Azure) and infrastructure automation (Terraform, Helm, GitOps).
  • Strong debugging and performance optimization skills for distributed systems.

Responsibilities

  • Architect, build, and operate features for Crusoe’s Managed Kubernetes platform (control plane, autoscaling, cluster lifecycle, upgrades, multi-tenancy).
  • Integrate and optimize GPU workloads within Kubernetes clusters, including device plugins, GPU operators, scheduling, and monitoring.
  • Enhance container networking through advanced CNI integration (Cilium, Calico, Multus) and support for high-performance networking (InfiniBand, RoCE).
  • Improve reliability and resilience of Kubernetes clusters, including HA control planes, node lifecycle management, and self-healing mechanisms.
  • Contribute to open-source and internal tooling that enhances observability, automation, and cluster security.
  • Participate in design reviews, provide mentorship to engineers, and help set long-term technical direction.
  • Troubleshoot complex distributed systems problems spanning containers, GPUs, and networking.

Other

  • Passion for building reliable, developer-friendly platforms that abstract complexity for customers.
  • Background in security for containers, GPUs, and Kubernetes (PodSecurity, RBAC, runtime scanning).
  • Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.