Together AI is seeking to optimize and enhance the performance of their AI inference systems.
Requirements
- Proficiency with Python and PyTorch.
- Demonstrated experience in building high performance libraries and tooling.
- Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale.
- Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum
- Knowledge of AI inference techniques such as speculative decoding.
- Knowledge of CUDA/Triton programming.
Responsibilities
- Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
- Develop and optimize runtime inference services for large-scale AI applications.
- Implement robust and fault-tolerant systems for data ingestion and processing.
- Create services, tools, and developer documentation to support the inference engine.
- Conduct design and code reviews to ensure high standards of quality.
Other
- 3+ years of experience writing high-performance, well-tested, production-quality code.
- Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
- US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits.