ByteDance is looking to solve the problem of delivering infrastructure that is highly performant, massively scalable, cost-efficient, and easy to use for large-scale LLM inference, to support its rapidly growing businesses and global fleet of machines.
Requirements
- Strong understanding of large model inference, distributed and parallel systems, and/or high-performance networking systems.
- Hands-on experience building cloud or ML infrastructure in areas such as resource management, scheduling, request routing, monitoring, or orchestration.
- Solid knowledge of container and orchestration technologies (Docker, Kubernetes).
- Proficiency in at least one major programming language (Go, Rust, Python, or C++)
- Experience contributing to or operating large-scale cluster management systems (e.g., Kubernetes, Ray).
- Hands-on experience with GPU programming (CUDA) or inference engines (vLLM, SGLang, TensorRT-LLM).
- Familiarity with public cloud providers (AWS, Azure, GCP) and their ML platforms (SageMaker, Azure ML, Vertex AI).
Responsibilities
- Design and build large-scale, container-based cluster management and orchestration systems with extreme performance, scalability, and resilience.
- Architect next-generation cloud-native GPU and AI accelerator infrastructure to deliver cost-efficient and secure ML platforms.
- Collaborate across teams to deliver world-class inference solutions using vLLM, SGLang, TensorRT-LLM, and other LLM engines.
- Stay current with the latest advances in open source (Kubernetes, Ray, etc.), AI/ML and LLM infrastructure, and systems research; integrate best practices into production systems.
- Write high-quality, production-ready code that is maintainable, testable, and scalable.
Other
- B.S./M.S. in Computer Science, Computer Engineering, or related fields with 2+ years of relevant experience (Ph.D. with strong systems/ML publications also considered).
- Excellent communication skills and ability to collaborate across global, cross-functional teams.
- Passion for system efficiency, performance optimization, and open-source innovation.
- Ability to work in a hyper-scale environment and collaborate with world-class engineers.