Developing systems and APIs that enable customers to perform inference and fine tune LLMs at scale.
Requirements
- Familiar with LLM inference ecosystem, including frameworks and engines (e.g. LLVM, SGLang, TRT, ...)
- Expert level programmer in one or more of Python, Go, Rust, or C/C++
- Experience implementing runtime inference services at scale or similar
- Demonstrated experience in building large scale, fault tolerant, distributed systems like storage, search, and computation
- 5+ years experience writing high-performance, well-tested, production quality code
Responsibilities
- Design and build the production systems that power the Together Cloud inference and fine-tuning APIs, enabling reliability and performance at scale
- Analyze and improve efficiency, scalability, and stability of various system resources
- Conduct design and code reviews
- Create services, tools & developer documentation
- Create testing frameworks for robustness and fault-tolerance
- Participate in an on-call rotation to respond to critical incidents as needed
Other
- Partner with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world
- Bachelor’s degree in computer science or equivalent industry experience
- US base salary range for this full-time position is: $160,000 - $220,000 + equity + benefits