ByteDance's Inference Infrastructure team is looking to solve the problem of building and operating next-generation cloud-native, GPU-optimized orchestration systems for large-scale LLM inference, enabling both internal and external developers to bring AI workloads from research to production at scale.
Requirements
- Strong understanding of large model inference, distributed and parallel systems, and/or high-performance networking systems.
- Hands-on experience building cloud or ML infrastructure in areas such as resource management, scheduling, request routing, monitoring, or orchestration.
- Solid knowledge of container and orchestration technologies (Docker, Kubernetes).
- Proficiency in at least one major programming language (Go, Rust, Python, or C++).
- Experience contributing to or operating large-scale cluster management systems (e.g., Kubernetes, Ray).
- Experience with workload scheduling, GPU orchestration, scaling, and isolation in production environments.
- Hands-on experience with GPU programming (CUDA) or inference engines (vLLM, SGLang, TensorRT-LLM).
Responsibilities
- Design and build large-scale, container-based cluster management and orchestration systems with extreme performance, scalability, and resilience.
- Architect next-generation cloud-native GPU and AI accelerator infrastructure to deliver cost-efficient and secure ML platforms.
- Collaborate across teams to deliver world-class inference solutions using vLLM, SGLang, TensorRT-LLM, and other LLM engines.
- Stay current with the latest advances in open source (Kubernetes, Ray, etc.), AI/ML and LLM infrastructure, and systems research; integrate best practices into production systems.
- Write high-quality, production-ready code that is maintainable, testable, and scalable.
Other
- PhD Internships at ByteDance aim to provide students with the opportunity to actively contribute to our products and research, and to the organization's future plans and emerging technologies.
- Our dynamic internship experience blends hands-on learning, enriching community-building and development events, and collaboration with industry experts.
- Applications will be reviewed on a rolling basis - we encourage you to apply early.
- Please state your availability clearly in your resume (Start date, End date).
- Excellent communication skills and ability to collaborate across global, cross-functional teams.