ByteDance is looking to solve the problem of developing and maintaining massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI
Requirements
- Excellent coding skills, strong understanding of data structures, and fundamental knowledge of algorithms
- Proficiency in programming languages such as C/C++, Java, Go, Python, etc
- Rich experience in online architecture, with the ability to troubleshoot independently
- Understanding of GPU hardware architecture, familiarity with GPU software stack (CUDA, cuDNN), and experience in GPU performance analysis
- Knowledge of LLM models, experience in accelerating LLM model optimization is preferred
Responsibilities
- Participating in online architecture design and optimization centered around LLM inference tasks, achieving high concurrency and throughput in large-scale online systems
- Participating in the establishment of a comprehensive system covering stability, disaster recovery, R&D efficiency, and cost, enhancing overall system stability
- Participating in the design and implementation of end-to-end online pipeline systems with multiple models, plugins, and storage-computation components, enabling agile, flexible, and observable continuous delivery
- Collaborating closely with the MLE for optimization of algorithms and systems
- Being proactive, optimistic, highly responsible, and demonstrating meticulous work ethic, as well as possessing strong team communication and collaboration skills
Other
- Currently pursuing an Undergraduate/Master in Computer Science or a related technical discipline
- Must be able to commit to a 12-week full-time work period during Summer or Fall 2026
- Strong sense of responsibility, good learning ability, communication skills, and self-motivation
- Being proactive, optimistic, highly responsible, and demonstrating meticulous work ethic, as well as possessing strong team communication and collaboration skills