Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Research Engineer Intern - Doubao - Seed - Machine Learning System - 2025 Summer - MS

Salary not specified

May 9, 2025

San Jose, CA, USA

The business is looking to improve the efficiency and stability of large scale distributed training jobs in machine learning systems.

Familiarity with machine learning algorithms and platforms
Familiarity with C/C++ and Python development in Linux environments
Familiarity with at least one deep learning framework (TensorFlow, PyTorch, MXNet, or other)
GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs)
Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD
Familiarity with AI compiler stacks such as torch.fx, XLA and MLIR

Research and develop efficient machine learning systems
Develop a state-of-the-art asynchronous training framework
Implement general purpose training framework features and model specific optimizations
Improve efficiency and stability for extremely large scale distributed training jobs

Currently pursuing a MS in Software Development, Computer Science, Computer Engineering, or a related technical discipline
Ability to work independently and complete projects from beginning to end and in a timely manner
Good communication and teamwork skills to clearly communicate technical concepts with other teammates
Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment