Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Xai Logo

Software Engineer - Infrastructure/Supercomputing

Xai

$180,000 - $370,000
Sep 21, 2025
Palo Alto, CA, US
Apply Now

xAI is looking to build and operate some of the world's largest GPU supercomputing clusters for AI training and serving production models, requiring robust, secure service delivery across production environments.

Requirements

  • Kubernetes
  • Pulumi
  • Rust and Go
  • Flux / ArgoCD
  • Writing scalable and highly available containerized applications in Rust.
  • Managing compute fleets with Pulumi, Terraform, Ansible, or other stateful automation libraries.

Responsibilities

  • Operating some of the world’s largest GPU supercomputing clusters for both AI training and serving production models.
  • Implement IaC best practices, enhancing deployment pipelines, and ensuring robust, secure service delivery across our production environments.
  • Working with both on-premise clusters and cloud providers.
  • Help with security best practices for internal researchers and live external traffic.
  • Writing scalable and highly available containerized applications in Rust.
  • Managing compute fleets with Pulumi, Terraform, Ansible, or other stateful automation libraries.

Other

  • Candidates are expected to be located near the Bay Area or open to relocation.
  • All engineers are expected to have strong communication skills.
  • They should be able to concisely and accurately share knowledge with their teammates.
  • Work ethic and strong prioritization skills are important.