The ByteDance Doubao (Seed) Team is looking to solve problems in content understanding and creation using CV/NLP related technologies, focusing on multi-modal understanding, vision and language, foundation models, and audio/music understanding and generation with an emphasis on content creation.
Requirements
- Strong understanding of Transformer architectures, including Dense and Mixture-of-Experts (MoE), and familiarity with scaling models on GPUs or TPUs.
- Hands-on experience with PyTorch or JAX, along with distributed training frameworks.
- Familiarity with state-of-the-art techniques for preparing large-scale multimodal training datasets.
- Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, EMNLP, ACL, COLING, etc.
- Highly competent in algorithms and programming; Strong coding skills in Python and popular deep learning frameworks.
- Self-motivated with a strong interest in multimodal learning, reasoning, and model scalability.
Responsibilities
- Drive research and development of models that enhance multimodal understanding and improve reasoning capabilities.
- Design and implement novel model architectures that balance performance and computational efficiency.
- Investigate scaling strategies and conduct systematic ablation studies to derive transferable insights.
- Collaborate closely with senior researchers on cutting-edge projects, with the opportunity to publish findings at top-tier conferences.
Other
- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Work and collaborate well with team members.
- Ability to work independently; Strong communication skills.