Apple's Cloud network Infrastructure team is looking to hire a Site Reliability Engineer to support and scale cloud services, ensuring high availability, performance, and resilience for critical Apple services used by billions of customers.
Requirements
- Expert knowledge of API design and interface technologies
- Strong systems programming skills including multi-threading, concurrency, caching, batching
- In depth knowledge of K8s, system virtualization, build systems and infrastructure as code
- Experience in crafting and operationalizing large scale distributed, fault-tolerant, multi-tenant services
- Experience with operating systems and network fundamentals
- Experience in API design and interface technologies (JSON, ProtoBuf, REST, RPC, XML, etc)
Responsibilities
- As a part of launch readiness, support activities such as system design engineering, developing software tools and platforms, managing/planning capacity, and conducting launch reviews to ensure readiness
- Maintain service quality via monitoring and improving availability, performance and health.
- Proactive designs and process implementations to mitigate risk, reduce impact radius, incident detection and resolution times.
- Deliver on a sustainable incident response practices learning from experiences through blameless postmortems.
- Collaborate with cross-functional teams in driving service integrations, resolving dependencies and representing the service offerings.
Other
- Highly self-motivated with a passion for excellence, quality and detail.
- Strong record of leading large multi-functional projects
- Outstanding communication skills with the ability to articulate concepts, designs and decisions.
- Drive and collaboration are the keys to success.
- Bring passion and dedication to your job