Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Distributed ML Systems Engineer- Inference

$160,000 - $230,000

Dec 20, 2025

San Francisco, CA, US

Together AI is seeking to design and build scalable machine learning systems that power their accelerated AI initiatives.

Strong programming skills in one or more of Python, Go, Rust, or C/C++.
Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.
Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.
Experience with Kubernetes (Preferred)
Experience with Pytorch (Preferred)
3+ years of experience in building large-scale, fault-tolerant, high-performance distributed systems.

Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
Develop and optimize distributed processing frameworks and storage systems.
Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.
Conduct architecture and design reviews to ensure best practices in system design.
Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.

Strong problem-solving skills and ability to work in a fast-paced environment.
US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.
Startup equity, health insurance, and other competitive benefits.
Equal Opportunity Employer