The company is looking to solve the visual intelligence problem for AI through cutting-edge research and development in foundation models and multimodal machine learning.
Requirements
- Research experience in multi-modal understanding, vision and language, such as video captioning, VQA, Text-to-video retrieval, audio/music understanding and generation, and other related topics
- Highly competent in algorithms and programming
- Strong coding skills in Python and popular deep learning frameworks
Responsibilities
- Conduct cutting-edge research and development in foundation model and multimodal machine learning, especially in the areas of generative AI (e.g. image, video generation)
- Develop the foundation model to enhance the strategic advantages for ByteDance products
- Explore new downstream products with artificial intelligence technology at its core
Other
- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline
- Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment
- Work and collaborate well with team members
- Ability to work independently
- Strong communication skills