Apple Services Engineering (ASE) team needs to build and provide platforms, services, and infrastructure that fuel Apple's services, ensuring they scale globally, remain highly available, and provide a high-quality customer experience.
Requirements
- Understanding of SRE principles, includes monitoring, alerting, error budgets, fault analysis, capacity planning, automation and toil reduction.
- Proficiency in at least one programming language - python, go or Java.
- Experience managing and scaling distributed systems in a public, private, or hybrid cloud environment.
- Experience with microservices architecture and container orchestration using Kubernetes or similar technologies.
- Proficiency in both backend coding technologies (python, go and java) and frontend coding technologies (javascript and its variants)
- Strong understanding of Linux operating system fundamentals, networking principles, and system management.
Responsibilities
- Operate, monitor, and triage all aspects of our production and non-production environments.
- Pioneer and implement the next-generation telemetry system.
- Prepare alert handling procedures, runbooks, and collaborate with the off-shore SRE teams.
- Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
- Actively participate in capacity planning, scale testing, and disaster recovery exercises.
- Interact with and support partner teams, including engineering, QA, and program management.
- Cultivate and maintain relationships with internal and external third-party vendors.
Other
- 6+ years of demonstrated expertise in Site Reliability Engineering, Infrastructure Ops or DevOps-focused role.
- BS or MS in Computer Science / related fields or equivalent work experience.
- Strong sense of ownership, with a desire to communicate and collaborate with other engineers and teams.
- Apple is an equal opportunity employer that is committed to inclusion and diversity.