Doubao Vision team is looking to solve the visual intelligence problem for AI by conducting cutting-edge research on areas like vision and language, large vision models, and generative foundation models, and applying these technologies to their rich application scenarios.
Requirements
- Research experience in multi-modal understanding, vision and language, such as video captioning, VQA, Text-to-video retrieval, audio/music understanding and generation, and other related topics.
- Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, EMNLP, ACL, COLING, etc.
- Highly competent in algorithms and programming; Strong coding skills in Python and popular deep learning frameworks.
Responsibilities
- Conduct cutting-edge research and development in foundation model and multimodal machine learning, especially in the areas of generative AI (e.g. image, video generation).
- The primary objective is to research cutting-edge video generation technology through innovation.
- Develop the foundation model to enhance the strategic advantages for ByteDance products
- Explore new downstream products with artificial intelligence technology at its core.
Other
- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.
- Work and collaborate well with team members.
- Ability to work independently; Strong communication skills.