Apple is looking to push the boundaries of human understanding by leveraging the potential of multimodal foundation and large language models for applications in the computer vision and machine learning domain.
Requirements
- Experience in developing, training/tuning multimodal LLMs.
- Programming skills in Python.
- Expertise in one or more of: computer vision, NLP, multimodal fusion, Generative AI.
- Experience with at least one deep learning framework such as JAX, PyTorch, or similar.
- Publication record in relevant venues.
Responsibilities
- Work on ground breaking research projects to advance our AI and computer vision capabilities
- Contribute to both foundational research and practical applications on multimodal large language models
- Design, implement, and evaluate algorithms and models for human understanding
- Develop and explore multimodal large language models that integrate diverse data modalities such as text, image, video, and audio
- Collaborate with multi-functional teams, including researchers, data scientists, software engineers, human interface designers and application domain experts
- Stay up-to-date on the latest advancements in AI, machine learning, and computer vision and apply this knowledge to drive innovation within the company
Other
- Bachelors degree and a minimum of 3 years relevant industry experience.
- PhD in Computer Science, Electrical Engineering, or a related field with a focus on AI, machine learning, or computer vision.