The Compute Infrastructure team at ByteDance is looking to build and improve a large-scale, reliable, and efficient compute infrastructure that powers hundreds of large-scale clusters globally, supporting millions of online containers and offline jobs daily, including AI and LLM workloads. They aim to build cutting-edge, industry-leading infrastructure to empower AI innovation.
Requirements
- Experience with coding in Python, Java, Golang, C, or C++
- Currently pursuing a PhD degree in Computer Science or a related field
- Demonstrated software engineering experience from previous internship, work experience, coding competitions, or publications
- High levels of creativity and quick problem-solving capabilities
Responsibilities
- Work on ultra-large-scale Kubernetes cluster management platform
- Develop Next-Gen AI-Native Godel K8s scheduler with AI intelligence built-in
- Build Intelligent node-level management & scheduling system for heterogenous resources (CPU/GPU, Memory bandwidth, Network bandwidth, Power, etc)
- Optimize performance for container runtimes and container image distribution
- Enhance K8s Control/data plane stability & reliability with automatic & intelligent observability tools
Other
- Currently pursuing a PhD degree in Computer Science or a related field
- Able to commit to working for 12 weeks during Summer 2026
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Intent to return to degree-program after the completion of the internship