Apple is looking to support and scale cloud services by hiring a Site Reliability Engineer to maintain high availability, scale, and resilience of cloud network services, which are critical to delivering Apple's services such as iCloud, iTunes, Siri, and Maps to billions of customers.
Requirements
- Experience in crafting and operationalizing large scale distributed, fault-tolerant, multi-tenant services
- Experience with operating systems and network fundamentals
- Experience in API design and interface technologies (JSON, ProtoBuf, REST, RPC, XML, etc)
- Expert knowledge of API design and interface technologies
- Strong systems programming skills including multi-threading, concurrency, caching, batching
- In depth knowledge of K8s, system virtualization, build systems and infrastructure as code
Responsibilities
- As a part of launch readiness, support activities such as system design engineering, developing software tools and platforms, managing/planning capacity, and conducting launch reviews to ensure readiness
- Maintain service quality via monitoring and improving availability, performance and health
- Proactive designs and process implementations to mitigate risk, reduce impact radius, incident detection and resolution times
- Deliver on a sustainable incident response practices learning from experiences through blameless postmortems
- Collaborate with cross-functional teams in driving service integrations, resolving dependencies and representing the service offerings
Other
- Bachelor's degree or equivalent experience
- Outstanding communication skills with the ability to articulate concepts, designs and decisions
- Strong record of leading large multi-functional projects
- Ability to work in a fast-paced organization where drive and collaboration are the keys to success
- Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services