TikTok is looking to research and optimize its planet-scale business architecture to support billions of users, globally distributed ultra-large-scale data centers, and microservice clusters with millions of services. This involves addressing key challenges in large-scale infrastructure and business architecture, including capacity forecasting and planning, service and data placement optimization, traffic scheduling optimization, and risk prediction, while embracing AI and focusing on resource management, cost optimization, user experience, reliability, engineering quality, and development efficiency.
Requirements
- Solid research background in distributed systems, large-scale system optimization, or AI for Systems.
- Strong problem-solving skills with the ability to model planet-scale, complex business and infrastructure systems, abstract key variables and constraints, and clearly articulate solutions through rigorous logical reasoning and effective communication.
- Familiarity with optimization algorithms such as integer programming, dynamic programming, graph optimization, heuristic search, or reinforcement learning, with the ability to apply them to large-scale optimization problems.
- Proficiency in at least one mainstream programming language (e.g., Go, Java, C++, Python), with the capability to build research prototypes.
- Ability to quickly learn and deeply understand complex architectural patterns, while leveraging AI tools and automation methods to improve research and development efficiency.
- Experience in data modeling and analysis for large-scale systems or business architectures.
- Research publications or submissions to top-tier conferences/journals such as OSDI, SOSP, NSDI, SoCC, SIGCOMM, KDD, NeurIPS.
Responsibilities
- Contribute to research and optimization of TikTok’s planet-scale business architecture, supporting billions of users, globally distributed ultra-large-scale data centers, and microservice clusters with millions of services.
- Assist in data modeling and algorithmic optimization to address key challenges in large-scale infrastructure and business architecture, including but not limited to capacity forecasting and planning, service and data placement optimization, traffic scheduling optimization, and risk prediction.
- Under the guidance of mentors, transform research insights into prototypes, tools, or frameworks, and explore their applicability in selected business scenarios.
- Participate in analyzing research data, summarizing experimental results, and contributing to academic publications in top-tier conferences/journals.
- Collaborate with architecture, platform, and business teams to understand real-world production challenges and explore potential optimization opportunities.
Other
- Currently pursuing a PhD in Computer Science, Software Engineering, Systems Engineering, Operations Research, Applied Mathematics, or related fields.
- Able to commit to working for 12 weeks during Summer 2026
- Hands-on project experience in distributed systems, resource scheduling, or architecture optimization (including academic projects or previous internships).
- Familiarity with performance and optimization challenges in cross-data-center
- Please state your availability clearly in your resume (Start date, End date).