ByteDance's AML-MLsys team aims to develop and maintain massively distributed ML training and inference systems worldwide, providing high-performance, highly reliable, and scalable systems for LLM/AIGC/AGI. The team integrates with GPU/NPU/RDMA/Storage to ensure stability and reliability.
Requirements
- Currently pursuing a PhD in computer science, automation, electronics engineering or a related technical discipline
- Proficient in algorithms and data structures, familiar with Python
- Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch.
- Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc.
- Familiar with FSDP, Deepspeed, JAX SPMD, Megatron-LM, Verl, TensorRT-LLM, ORCA, VLLM, SGLang, etc.
- Knowledge of LLM models, experience in accelerating LLM model optimization is preferred.
Responsibilities
- Responsible for developing and optimizing LLM training & inference & Reinforcement Learning framework.
- Working closely with model researchers to scale LLM training & Reinforcement Learning to the next level.
- Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM training and inference and RL engine.
Other
- Currently pursuing a PhD in computer science, automation, electronics engineering or a related technical discipline
- Internships at ByteDance aim to offer students industry exposure and hands-on experience
- Please state your availability clearly in your resume (Start date, End date)
- Candidates who pass resume screening will be invited to participate in ByteDance's technical online assessment
- Interns have day one access to health insurance, life insurance, wellbeing benefits and more