Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Together AI Logo

Senior Backend Engineer - Together Cloud

Together AI

$160,000 - $230,000
Aug 12, 2025
San Francisco, CA, US
Apply Now

Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, and needs a Senior AI Infrastructure Engineer to play a key role in building the next generation AI cloud platform.

Requirements

  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • 5+ years experience writing high-performance, well-tested, production quality code
  • Demonstrated experience with building and operating high-performance and/or globally distributed micro-service architectures across one or more cloud providers (AWS, Azure, GCP)
  • Deep experience with Kubernetes internals a big plus, such as implementing non-trivial Kubernetes operators, device/storage/network plugins, custom schedulers, or patches thereon or Kubernetes itself
  • Deep experience with VMs/hypervisors a big plus, such as QEMU/KVM, cloud-hypervisor, VFIO, virtio, PCIE passthrough, Kubevirt, SR-IOV
  • Experience with infrastructure automation tools (Terraform, Ansible), monitoring/observability stacks (Prometheus, Grafana), and CI/CD pipelines (GitHub Actions, ArgoCD)
  • GPU programming, NCCL, CUDA knowledge a plus

Responsibilities

  • Design, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management
  • Design and build out the IaaS software layer for a new GB200 data center with thousands of GPUs
  • Work on a global multi-exabyte high-performance object store, serving massive datasets for pretraining
  • Build advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining
  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation

Other

  • Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
  • Strong fundamental software development skills
  • Strong systems knowledge and troubleshooting abilities
  • Flexibility in terms of remote work
  • Health insurance, and other benefits