Apple is looking to develop an innovative, AI-driven evaluation ecosystem to accelerate and empower AI development at Apple, specifically focusing on generative AI and large language models.
Requirements
- Strong foundation in machine learning fundamentals with the ability to tackle sophisticated ML challenges.
- Experience or proven curiosity about designing and implementing AI-driven approaches to evaluation (e.g. LLM-as-a-judge, automated evaluation, etc).
- Demonstrated ability to develop high-impact language model systems for real-world applications.
- Expertise in GenAI, LLM, and/or NLP/NLU evaluation.
- Proficient in software engineering standard methodologies (e.g., modular software design, testing).
- Strong proficiency in Python.
- Strong proficiency PyTorch, TensorFlow, or Jax.
Responsibilities
- develop an innovative, AI-driven evaluation ecosystem in order to accelerate and empower AI development at Apple
- Working at the intersection of applied research, ML & GenAI engineering, and tool development, you will champion principles of iterative experimentation, innovation, and enablement.
- Your work will span the full development lifecycle-from prototyping new ideas to designing and deploying reliable, production grade systems.
- solve fundamental problems in AI evaluation, such as developing innovative LLM-judges, automating error analysis, methods for validation data, and optimizing human-AI collaboration, all while pushing the boundaries of core AI capabilities.
- developing and owning high-impact, developer-facing systems and tools.
- evaluating sophisticated agentic systems using LLMs.
- adapting and aligning LLMs through various training strategies, e.g. continued pre-training, supervised fine-tuning, and reinforcement learning.
Other
- customer experience focused attitude
- provide engineering and research leadership at the ground floor of a critical effort with deep organizational impact.
- Excellent communication skills with a proven ability to engage diverse collaborators.
- 5+ years with a Master's degree, 3+ years with a PhD, or equivalent practical experience.
- Track record of contributions to open-source ML projects or publications in top-tier ML conferences (e.g., NeurIPS, ICML, ACL).