Facebook Video Intelligence team is looking to develop advanced video generation and understanding foundation models to enable innovative AI-driven video creation experiences and enhance the ability to comprehend video content.
Requirements
- PhD in Computer Science, Machine Learning, or a relevant technical field
- Experience training multimodal, computer vision, LLM or related AI/ML models
- Programming experience in Python and hands-on experience with frameworks such as PyTorch
- First-authored publications at peer-reviewed conferences (e.g. ICLR, NeurIPS, ICML, KDD, CVPR, ICCV, ACL)
- Experience building text-to-video generative models, image-to-video generative models, video understanding models, and/or unified native video generative models
Responsibilities
- Build a variety of multimodal foundation models such as text-to-video generative models, image-to-video generative models, video understanding models, unified native video generative models
- Design core foundation model architectures and progressive pre-train
- Post-train foundation models using techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and Low-Rank Adaptation (LoRA)
- Conduct research to develop SOTA GenAI models for the Facebook family of apps
Other
- Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
- Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
- Experience collaborating in cross-functional teams, including product, engineering, and research