Apple's Cloud network Infrastructure team is looking to support and scale cloud services by hiring a Site Reliability Engineer to maintain high availability, scale, and resilience of cloud network services.
Requirements
- Experience in crafting and operationalizing large scale distributed, fault-tolerant, multi-tenant services
- Experience with operating systems and network fundamentals
- Experience in API design and interface technologies (JSON, ProtoBuf, REST, RPC, XML, etc)
- Expert knowledge of API design and interface technologies
- Strong systems programming skills including multi-threading, concurrency, caching, batching
- In depth knowledge of K8s, system virtualization, build systems and infrastructure as code
Responsibilities
- As a part of launch readiness, support activities such as system design engineering, developing software tools and platforms, managing/planning capacity, and conducting launch reviews to ensure readiness
- Maintain service quality via monitoring and improving availability, performance and health.
- Proactive designs and process implementations to mitigate risk, reduce impact radius, incident detection and resolution times.
- Deliver on a sustainable incident response practices learning from experiences through blameless postmortems.
- Collaborate with cross-functional teams in driving service integrations, resolving dependencies and representing the service offerings.
Other
- Highly self-motivated with a passion for excellence, quality and detail.
- Collaborate with cross-functional teams
- Strong record of leading large multi-functional projects
- Outstanding communication skills with the ability to articulate concepts, designs and decisions.