Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

ByteDance Logo

Research Engineer Graduate (Seed-Infra-Inference-US) - 2026 Start (PhD)

ByteDance

$129,960 - $246,240
Dec 14, 2025
Seattle, WA, US
Apply Now

ByteDance is looking to solve the problem of developing and maintaining massively distributed ML training and inference systems/services, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI.

Requirements

  • Excellent coding ability, solid foundation in data structures and basic algorithms, proficient in C/C++ or Python.
  • Familiar with at least one mainstream machine learning framework (TensorFlow/PyTorch/Jax).
  • Master the principles of distributed systems, and participated in the design, development, and maintenance of large-scale distributed systems.
  • Prior experience in large-scale projects or papers with great influence in the field of large models.
  • Familiar with NLP, CV-related algorithms, and technologies, and experienced in large model training and RL algorithms.
  • Experience in one of the following fields: CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing (cutlass, NCCL), ML Hardware Architecture (GPU, Accelerators, Networking), ML for System, and Distributed Storage.
  • Demonstrated a related technical experience from previous internship, work experience, coding competitions, or publications

Responsibilities

  • Responsible for the design and development of the architecture of large-scale machine learning systems, solving technical difficulties such as high concurrency, high reliability, and high scalability of the system.
  • Covering various sub-directions of machine learning system, including resource scheduling, model training, model inference, data management, and workflow orchestration.
  • Responsible for the research and introduction of advanced technologies in machine learning systems, such as the latest hardware architecture, heterogeneous computing systems, and compiler-based optimization technologies.
  • Working closely with the algorithm teams to optimize the algorithm and system jointly.
  • Responsible for the machine learning system development of the company's large-scale models, researching new applications and solutions of related technologies in areas such as search, recommendation, advertising, content creation, conversation, and customer service.
  • Meeting the growing demand for intelligent interaction from users, and comprehensively improving users' lifestyles and communication methods in the future world.
  • Building the large-scale heterogeneous system integrating with GPU/NPU/RDMA/Storage and keeping it running stable and reliable.

Other

  • Final year or recent PhD graduate with a background in Computer Science, related technical field or equivalent industrial research experience
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
  • Strong sense of responsibility, good learning ability, communication ability, and self-motivation.
  • Good communication and collaboration skills, able to explore new technologies with the team and promote technological progress.
  • Commit to an onboarding date by end of year 2026