Apple's Data Platform needs to build a unified orchestration layer to power large-scale data and ML workflows across the company, enabling teams to train models, analyze data, and deploy AI at Apple scale with strong governance.
Requirements
- Experience designing, building, and maintaining ML infrastructure and deployment pipelines using containerization technologies (Docker, Kubernetes preferred) and cloud platforms (AWS, Azure, or GCP)
- Proficient coding skills in Python, Go, or Scala with experience in ML frameworks (TensorFlow, PyTorch, MLflow, Kubeflow)
- Strong experience with Infrastructure as Code (Terraform, CloudFormation) and CI/CD tools (Jenkins, GitLab CI, GitHub Actions)
- Proficiency in monitoring and observability tools (Prometheus, Grafana, ELK stack) for ML model performance and system health
- Experience with data pipeline orchestration tools (Airflow, Prefect, Dagster) and streaming platforms (Kafka, Kinesis)
- Knowledge of ML model versioning, experiment tracking, and feature stores (MLflow, Weights & Biases, Feast)
- Experience with automated testing frameworks for ML systems, including data validation and model testing
Responsibilities
- Design, implement, and maintain end-to-end ML pipelines from data ingestion to model deployment and monitoring
- Build and optimize automated training, validation, and deployment workflows that support rapid experimentation and production releases
- Develop robust monitoring and alerting systems to ensure model performance, data quality, and system reliability
- Create self-service tools and platforms that enable ML teams to deploy and manage models independently
- Implement security and privacy controls throughout the ML lifecycle, ensuring compliance with Apple's high standards
- Drive infrastructure cost optimization and resource efficiency across ML workloads
- Establish best practices for model governance, including versioning, rollback strategies, and A/B testing frameworks
Other
- 5+ years of experience in MLOps, DevOps, or related infrastructure roles
- Collaborate with diverse teams including accessibility specialists to ensure ML tools are usable by team members with varying abilities
- Build documentation and training materials that support teams with different technical backgrounds
- Excellent grasp of software engineering fundamentals and DevOps practices
- Proficient knowledge of Git and collaborative development workflows