Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Together AI Logo

Distributed ML Systems Engineer- Inference

Together AI

$160,000 - $230,000
Aug 12, 2025
San Francisco, CA, US
Apply Now

Together AI is seeking to design and build scalable machine learning systems that power their accelerated AI initiatives.

Requirements

  • Strong programming skills in one or more of Python, Go, Rust, or C/C++.
  • Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.
  • Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.
  • Experience with Kubernetes (Preferred)
  • Experience with Pytorch (Preferred)
  • 3+ years of experience in building large-scale, fault-tolerant, high-performance distributed systems.

Responsibilities

  • Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
  • Develop and optimize distributed processing frameworks and storage systems.
  • Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.
  • Conduct architecture and design reviews to ensure best practices in system design.
  • Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.

Other

  • Strong problem-solving skills and ability to work in a fast-paced environment.
  • US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.
  • Startup equity, health insurance, and other competitive benefits.
  • Equal Opportunity Employer