Apple is looking to build the next generation of AI evaluation systems to make AI systems more measurable, testable, and trustworthy in real-world scenarios.
Requirements
- Strong programming skills in Python or another modern language (e.g., Java, Swift, Go)
- Basic understanding of machine learning principles
- Interest in LLMs, generative AI, or agent-based systems
- Familiarity with training or evaluating models (even via coursework or personal projects)
- Exposure to tools like PyTorch, TensorFlow, or Hugging Face
- Interest in AI observability, behavior simulation, or synthetic data
Responsibilities
- Contribute to systems that simulate interactive behaviors (including LLM-driven agents)
- Help build tools to support dataset generation and evaluation workflows
- Assist in developing pipelines for structured insights from model behavior
- Collaborate with teammates to debug and improve evaluation systems
- Write clean, testable code to support scalable and reliable infrastructure
- Learn how to define metrics that connect model behavior to real-world outcomes
Other
- Bachelor’s or Master’s degree in Computer Science, Machine Learning, or related field
- Strong collaboration and communication skills
- Curiosity about how to evaluate and improve real-world AI performance
- Passion for working cross-functionally in fast-moving, exploratory teams
- Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services