Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

ByteDance Logo

Student Researcher - Seed Vision - Multimodal Interaction & World Model Pretraining - PhD

ByteDance

Salary not specified
Aug 22, 2025
San Jose, CA, USA
Apply Now

The Seed Multimodal Interaction and World Model team is dedicated to developing models that boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products.

Requirements

  • Currently pursuing a PhD in Computer Vision, Machine Learning, or a related technical field.
  • Familiarity with multimodal modeling, world models, or foundation model pretraining.
  • Strong coding skills and hands-on experience with PyTorch or JAX.
  • Experience with large-scale distributed training frameworks and GPU/TPU compute stacks.
  • Demonstrated research ability, with publications in top-tier conferences such as CVPR, ICCV, ECCV, NeurIPS, ICML, or ICLR.
  • Experience working with transformer-based architectures, including dense and Mixture-of-Experts (MoE) models.
  • Understanding of scaling behavior in foundation models and how to analyze them.

Responsibilities

  • Contribute to research and engineering to advance world models and multimodal understanding, enhancing models' reasoning and generation capabilities.
  • Design and prototype novel architectures that balance modeling performance, generalization, and efficiency.
  • Help establish scaling laws and conduct systematic ablations to derive transferrable insights across model families and tasks.

Other

  • Currently pursuing a PhD in Computer Science, Machine Learning, or a related technical field.
  • Applications will be reviewed on a rolling basis – we encourage you to apply early.
  • Please state your availability clearly in your resume (Start date, End date).