Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

ByteDance Logo

Research Engineer Intern - Doubao - Seed - Machine Learning System - 2025 Summer - MS

ByteDance

Salary not specified
May 9, 2025
San Jose, CA, USA
Apply Now

The business is looking to improve the efficiency and stability of large scale distributed training jobs in machine learning systems.

Requirements

  • Familiarity with machine learning algorithms and platforms
  • Familiarity with C/C++ and Python development in Linux environments
  • Familiarity with at least one deep learning framework (TensorFlow, PyTorch, MXNet, or other)
  • GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs)
  • Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD
  • Familiarity with AI compiler stacks such as torch.fx, XLA and MLIR

Responsibilities

  • Research and develop efficient machine learning systems
  • Develop a state-of-the-art asynchronous training framework
  • Implement general purpose training framework features and model specific optimizations
  • Improve efficiency and stability for extremely large scale distributed training jobs

Other

  • Currently pursuing a MS in Software Development, Computer Science, Computer Engineering, or a related technical discipline
  • Ability to work independently and complete projects from beginning to end and in a timely manner
  • Good communication and teamwork skills to clearly communicate technical concepts with other teammates
  • Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment