ByteDance's GAI-Vision team is working on solving the visual intelligence problem for AI by developing multi-modality foundation models on visual understanding and visual generation.
Requirements
Research and engineering experience in one or more areas of computer vision and natural language processing, including but not limited to.
Experience in multi-modal understanding, vision and language, such as video captioning, VQA, Text-to-video retrieval, and other related topics.
Work with very large-scale datasets, and build very large-scale datasets to scale up foundation models.
Experience with language models and apply them in various downstream tasks.
Candidates with publications in top-tier venues such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, EMNLP, ACL, COLING, etc
Highly competent in algorithms and programming; Strong coding skills in Python and popular deep learning frameworks.
Responsibilities
Creating AI engineering infrastructure from end to end for large model development.
Training inference acceleration and deployment.
Other
Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
Work and collaborate well with team members.
Ability to work independently; Strong communication skills.
Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.
The hourly rate range for this position in the selected city is $60- $60.