The Compute Infrastructure team at ByteDance is looking to solve challenges in building and managing a large-scale, reliable, and efficient compute infrastructure that powers hundreds of large-scale clusters globally, with over millions of online containers and offline jobs daily, including AI and LLM workloads. They aim to build cutting-edge, industry-leading infrastructure to empower AI innovation, ensuring high performance, scalability, and reliability for demanding AI/LLM workloads.
Requirements
- Experience with coding in Python, Java, Golang, C, or C++
- Currently pursuing a PhD degree in Computer Science or a related field
- Demonstrated software engineering experience from previous internship, work experience, coding competitions, or publications
Responsibilities
- Ultra-large-scale Kubernetes cluster management platform
- Next-Gen AI-Native Godel K8s scheduler with AI intelligence built-in
- Intelligent node-level management & scheduling system for heterogenous resources (CPU/GPU, Memory bandwidth, Network bandwidth, Power, etc)
- Performance optimization for container runtimes and container image distribution
- K8s Control/data plane stability & reliability with automatic & intelligent observability tools
Other
- Currently pursuing a PhD degree in Computer Science or a related field
- Able to commit to working for 12 weeks during Summer 2026
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Intent to return to degree-program after the completion of the internship
- High levels of creativity and quick problem-solving capabilities