The company is looking to empower content interaction and creation using speech and audio related technologies.
Requirements
- Experience in one or more areas of machine learning and deep learning, including but not limited to: Automatic Speech Recognition, Automatic Speech Translation, Speech/audio self-supervised learning and foundation models, Speaker recognition and verification, Speech emotion recognition, Multimodal foundation models, Large Language Model pre-training and fine-tuning
- Publications in top-tier ML/DL venues such as NeurIPS, ICLR, ICML, AAAI and speech venues such as ICASSP, ASRU, Interspeech
- Deep understanding of Large Language models
- Familiar with distributed computing and large scale model training
- Familiar with deep learning frameworks such as Tensorflow and Pytorch
- Familiar with engineering principles and best practices
- Strong coding skills in C/C++ and Python
Responsibilities
- Conduct cutting-edge research and development in speech/audio foundation models
- Collaborate with cross-functional teams to identify key research areas and contribute to the development of innovative speech/audio models
- Work with product development teams to integrate research findings into practical applications for ByteDance and other platforms
- Collaborate on team-driven projects to address complex challenges and enhance the overall effectiveness of the research team
Other
- Master's or PhD in computer science, mathematics, engineering or related field
- Ability to work collaboratively in a fast-paced, multi-functional environments