ByteDance is looking to solve the problem of developing and optimizing machine learning systems, including heterogeneous computing architecture, management, scheduling, and monitoring, for AI foundation models.
Requirements
- Currently in PhD program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies.
- Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax.
- Have a basic understanding of how GPU and/or ASIC works.
- Expert in at least one or two programming languages in a Linux environment: C/C++, CUDA, Python.
- GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs).
- Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD.
- AI compiler stacks such as torch.fx, XLA and MLIR.
Responsibilities
- Research and develop our machine learning systems, including heterogeneous computing architecture, management, scheduling, and monitoring.
- Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC).
- Implement both general purpose training framework features and model specific optimizations (e.g. LLM, diffusions).
- Improve efficiency and stability for extremely large scale distributed training jobs.
Other
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Currently in PhD program
- 10 paid holidays per year and paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
- Day one access to health insurance, life insurance, wellbeing benefits and more
- Housing allowance for interns who are not working 100% remote