Apple Services Engineering (ASE) team needs to build and provide platforms, services and infrastructure that fuel Apple’s services (such as iCloud, iTunes, Siri, and Maps) to scale globally, stay highly available, and 'just work'
Requirements
- 6+ years of demonstrated expertise in Site Reliability Engineering, Infrastructure Ops or DevOps-focused role
- Understanding of SRE principles, includes monitoring, alerting, error budgets, fault analysis, capacity planning, automation and toil reduction
- Proficiency in at least one programming language - python, go or Java
- Experience managing and scaling distributed systems in a public, private, or hybrid cloud environment
- Experience with microservices architecture and container orchestration using Kubernetes or similar technologies
- BS or MS in Computer Science / related fields or equivalent work experience
- Experience running Tier 1 services for 24/7 support
- Proficiency in both backend coding technologies (python, go and java) and frontend coding technologies (javascript and its variants)
Responsibilities
- Operate, monitor, and triage all aspects of our production and non-production environments
- Pioneer and implement the next-generation telemetry system
- Prepare alert handling procedures, runbooks, and collaborate with the off-shore SRE teams
- Automate deployment and orchestration of services into the cloud environment as well as other routine processes
- Actively participate in capacity planning, scale testing, and disaster recovery exercises
- Interact with and support partner teams, including engineering, QA, and program management
Other
- BS or MS in Computer Science / related fields or equivalent work experience
- Strong sense of ownership, with a desire to communicate and collaborate with other engineers and teams