Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Staff Software Engineer, Compute

Fal

$180,000 - $250,000

Sep 3, 2025

San Francisco, CA, US

The company is looking for an experienced software engineer to build and maintain large-scale computation platforms, focusing on backend systems that efficiently orchestrate workloads, route requests, and manage resources while ensuring reliability and scalability with minimal operational load.

Requirements

Deep experience building distributed compute platforms, preferably with Python
Strong foundation in managing both cloud and bare metal infrastructure
Solid understanding of K8s and CI/CD on it
Deep expertise in backend systems that orchestrate workloads and route requests efficiently, while taking care of capacity and resource constraints
Strong understanding of foundational cloud infrastructure and Linux provisioning and management tools
Know how to achieve reliability and scale with minimum operational load

Responsibilities

Develop and maintain our core Python platform, which handles routing of requests, orchestration of AI workloads, GPU server capacity management, observability, authentication, rate limiting, and many others
Develop and maintain our infrastructure layer where we use Terraform, Ansible, and provider APIs to manage our fleet of GPU workers
Own K8s, FluxCD, Nomad, Prometheus, Thanos, Grafana, Loki, distributed networking storage, and other technologies that underpin our platform
Create the vision and lay the foundation for where our infrastructure should go in the next 1/2/5 years

Other

Excellent communication
Self-starter who executes quickly, takes ownership and constantly seeks improvement
We offer visa sponsorship and will help you relocate to San Francisco.