Apple is looking to build the next generation of Apple Intelligence by enabling highly intuitive and personalized intelligence experiences across the Apple ecosystem through multi-modal perception and reasoning.
Requirements
- Proficiency in Python and deep learning frameworks such as PyTorch, or equivalent
- Practical experience with training and evaluating neural networks
- Familiarity with multimodal learning, vision-language models, or large language models
- Background in multi-modal reasoning, VLM, and MLLM research with impactful software projects.
- Solid understanding of natural language processing (NLP) and computer vision fundamentals.
- experience in vision-language models
Responsibilities
- Contribute to the development and adaptation of AI/ML models for multimodal perception and reasoning
- Innovate robust algorithms that integrate visual and language data for comprehensive understanding
- Conduct hands-on experimentation, model training, and performance analysis
- Stay current with emerging methods in VLMs, MLLMs, and related areas
- design and develop models and algorithms for multimodal perception and reasoning leveraging Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs)
- fine-tune/adapt/distill multi-modal LLMs
Other
- Master’s or Ph.D. in Computer Science, Artificial Intelligence, Machine Learning, or related field - or relevant industry experience
- Strong problem-solving skills and ability to work in a collaborative, product-focused environment
- Ability to communicate technical results clearly and concisely
- Collaborate closely with cross-functional teams to translate product requirements into effective ML solutions.
- Communicate research outcomes effectively to technical and non-technical stakeholders, providing actionable insights.