ByteDance's Core Compute Infrastructure organization is responsible for designing and operating platforms that power microservices, big data, distributed storage, machine learning training and inference, and edge computing across multi-cloud and global datacenters. With rapidly growing businesses and a global fleet of machines running hundreds of millions of containers daily, they are building the next generation of cloud-native, GPU-optimized orchestration systems. The Inference Infrastructure team is expanding its focus on LLM inference infrastructure to support new AI workloads, and is looking for engineers passionate about cloud-native systems, scheduling, and GPU acceleration.
Requirements
- Solid knowledge of container and orchestration technologies (Docker, Kubernetes).
- Proficiency in at least one major programming language (Go, Rust, Python, or C++).
- Experience contributing to or operating large-scale cluster management systems (e.g., Kubernetes, Ray).
- Experience with workload scheduling, GPU orchestration, scaling, and isolation in production environments.
- Hands-on experience with GPU programming (CUDA) or inference engines (vLLM, SGLang, TensorRT-LLM).
- Familiarity with public cloud providers (AWS, Azure, GCP) and their ML platforms (SageMaker, Azure ML, Vertex AI).
- Strong knowledge of ML systems (Ray, DeepSpeed, PyTorch) and distributed training/inference platforms.
Responsibilities
- Design and build large-scale, container-based cluster management and orchestration systems with extreme performance, scalability, and resilience.
- Architect next-generation cloud-native GPU and AI accelerator infrastructure to deliver cost-efficient and secure ML platforms.
- Collaborate across teams to deliver world-class inference solutions using vLLM, SGLang, TensorRT-LLM, and other LLM engines.
- Stay current with the latest advances in open source (Kubernetes, Ray, etc.), AI/ML and LLM infrastructure, and systems research; integrate best practices into production systems.
- Write high-quality, production-ready code that is maintainable, testable, and scalable.
Other
- Able to commit to working for 12 weeks during Summer 2026
- Excellent communication skills and ability to collaborate across global, cross-functional teams.
- Passion for system efficiency, performance optimization, and open-source innovation.