ByteDance is looking to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI
Requirements
- proficient in algorithms and data structures, familiar with Python
- Understand the basic principles of deep learning algorithms, be familiar with the basic architecture of neural networks and understand deep learning training frameworks such as Pytorch.
- Proficient in GPU high-performance computing optimization technology on CUDA, in-depth understanding of computer architecture, familiar with parallel computing optimization, memory access optimization, low-bit computing, etc.
- Familiar with FSDP, Deepspeed, JAX SPMD, Megatron-LM, Verl, TensorRT-LLM, ORCA, VLLM, SGLang, etc.
- Knowledge of LLM models, experience in accelerating LLM model optimization is preferred.
Responsibilities
- Responsible for developing and optimizing LLM training&inference&RL framework.
- Working closely with model researchers to scale LLM training&RL to the next level.
- Responsible for GPU and CUDA Performance optimization to create an industry-leading high-performance LLM training and inference and RL engine.
Other
- Bachelor's degree or above, major in computer/electronics/automation/software, etc.
- Commit to an onboarding date by end of year 2026
- Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws
- Reasonable Accommodation: ByteDance is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws.
- Diversity & Inclusion: ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives.