Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineer - ML Systems and Evaluation Engineering

Apple

$171,600 - $302,200

Sep 27, 2025

Seattle, WA, US

Apple is looking to solve the problem of ensuring the quality and reliability of conversational AI assistants and AI agents across their ecosystem by developing cutting-edge evaluation technologies and methodologies. They need to build evaluation platforms and tools that gate the quality of AI/ML products before they reach millions of users globally.

Requirements

5+ years of professional software development experience with demonstrated expertise in designing, implementing, and optimizing large-scale, data and compute-intensive frameworks, APIs, and tools
Strong software engineering capabilities including system design, backend development, testing, debugging, release management, and production maintenance
Expert-level proficiency in Python (required) and at least one additional object-oriented programming language (e.g., Swift, Java, Go)
Solid experience with service-oriented architecture and distributed systems design patterns
Backend development expertise with experience building scalable APIs, microservices, and platform infrastructure
ML lifecycle familiarity including exposure to data preprocessing, model training, evaluation methodologies, deployment strategies, monitoring approaches, and AI agent development workflows
Statistical evaluation methodology knowledge including experience with ML training pipelines, model accuracy assessment, performance optimization techniques, and AI agent evaluation frameworks

Responsibilities

Architect, build, and maintain innovative evaluation solutions and tools for large-scale statistical assessment of GenAI-powered products, models, and AI agents.
Deliver evaluation-as-a-service solutions that empower product and modeling teams across Apple to run comprehensive statistical evaluations, generate actionable metrics and insights, and make informed shipping decisions.
Partner with cross-functional teams to translate evaluation needs into robust technical solutions for conversational AI, language models, and AI agent capabilities.
Own end-to-end requirements gathering, proof-of-concept development, and co-drive the development roadmap for ML system evaluation platforms.
Design and implement scalable solutions that enable statistical analysis of product experiences, model performance, and AI agent behavior at scale.
Drive system integration efforts and influence how evaluation software is incorporated into ML model and AI agent CI/CD pipelines.
Develop monitoring and observability solutions to provide deep insights into platform performance, evaluation quality, and AI agent reliability.

Other

Solution-oriented
Thrives in fast-paced environments
Combines strategic thinking with hands-on problem-solving
Passionate about enabling data-driven decisions that enhance Apple product experiences for millions of users.
Cross-functional collaboration skills with strong organizational abilities and experience working effectively with multiple stakeholders across product, engineering, and research teams