Apple is looking to solve the problem of ensuring the quality and reliability of conversational AI assistants and AI agents across their ecosystem by developing cutting-edge evaluation technologies and methodologies. They need to build evaluation platforms and tools that gate the quality of AI/ML products before they reach millions of users globally.
Requirements
- 5+ years of professional software development experience with demonstrated expertise in designing, implementing, and optimizing large-scale, data and compute-intensive frameworks, APIs, and tools
- Strong software engineering capabilities including system design, backend development, testing, debugging, release management, and production maintenance
- Expert-level proficiency in Python (required) and at least one additional object-oriented programming language (e.g., Swift, Java, Go)
- Solid experience with service-oriented architecture and distributed systems design patterns
- Backend development expertise with experience building scalable APIs, microservices, and platform infrastructure
- ML lifecycle familiarity including exposure to data preprocessing, model training, evaluation methodologies, deployment strategies, monitoring approaches, and AI agent development workflows
- Statistical evaluation methodology knowledge including experience with ML training pipelines, model accuracy assessment, performance optimization techniques, and AI agent evaluation frameworks
Responsibilities
- Architect, build, and maintain innovative evaluation solutions and tools for large-scale statistical assessment of GenAI-powered products, models, and AI agents.
- Deliver evaluation-as-a-service solutions that empower product and modeling teams across Apple to run comprehensive statistical evaluations, generate actionable metrics and insights, and make informed shipping decisions.
- Partner with cross-functional teams to translate evaluation needs into robust technical solutions for conversational AI, language models, and AI agent capabilities.
- Own end-to-end requirements gathering, proof-of-concept development, and co-drive the development roadmap for ML system evaluation platforms.
- Design and implement scalable solutions that enable statistical analysis of product experiences, model performance, and AI agent behavior at scale.
- Drive system integration efforts and influence how evaluation software is incorporated into ML model and AI agent CI/CD pipelines.
- Develop monitoring and observability solutions to provide deep insights into platform performance, evaluation quality, and AI agent reliability.
Other
- Solution-oriented
- Thrives in fast-paced environments
- Combines strategic thinking with hands-on problem-solving
- Passionate about enabling data-driven decisions that enhance Apple product experiences for millions of users.
- Cross-functional collaboration skills with strong organizational abilities and experience working effectively with multiple stakeholders across product, engineering, and research teams