TikTok is looking to solve the problem of optimizing giant model system performance for GPU/AI application platforms, including giant model training and inference systems such as LLM, to meet the requirements of high-performance, low cost, and easy to operate.
Requirements
- Deep understanding of computer system architecture, especially on GPU/AI SoC or Platform Architecture, Interconnect Fabric, and Memory sub-system.
- Experienced in GPU/AI system application performance optimization or software hardware co-design.
- Strong knowledge and proficiency in software development in C/C++, scripting languages such as Python.
- Understand LLM model architecture, familiar with training and inference requirements on accelerator/memory/network.
- Understand the implementation of GPU/AI virtualization technology, deep learning architecture, and distributed systems.
Responsibilities
- Develop application benchmarks, tools and performance optimization methods for GPU/AI systems, including giant model training and inference systems such as LLM.
- Identify the system bottleneck/opportunity with deep system-level data-driven study, explore innovative options through SW-HW co-design, and lead them towards implementation to improve training and inference system efficiency.
- Develop GPU/AI system TCO model, based on application benchmark and performance optimization.
- Work with industry consortiums and open standard committees to investigate the emerging standards or technologies, and to contribute our research results to the industry.
- Work with our technology partners and suppliers to setup POC or prototypes to evaluate and test the new technologies or architectural designs.
Other
- Final year or recent PhD graduate with a background in Electrical Engineering, Computer Engineering, Computer Science or related majors.
- Thesis topics in GPU/AI platform architecture and/or application performance optimization design or software hardware co-design.
- Commit to an onboarding date by end of year 2026.
- State availability and graduation date clearly in resume.
- 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure)