Apple's Service Engineering team needs to enhance its internal cloud infrastructure offerings, specifically focusing on a high-performance batch compute platform that integrates the latest cloud hardware technologies with Apple's proprietary hardware and software. The goal is to deliver forward-looking, high-performance virtualized infrastructure to support the hardware and software teams powering the next generation of Apple devices, ensuring global scalability, high availability, and seamless operation for millions of customers.
Requirements
- Demonstrated knowledge and experience in distributed systems and operating systems, applied to build stable, performant, and secure execution environments
- Strong Linux / XNU development background, including kernel-level development
- Familiarity with all aspects of software development, from architecture to deployment and maintenance
- Ability to tackle and resolve complex issues across accelerator, virtualization, and networking layers, ensuring robust performance, stability, and security
- Quick learner and contributor to new code bases
- Fluency in Go (Golang), Python, C++ or similar languages in a systems context
- Prior experience working with diverse hardware, operating systems, container runtimes (lxc, docker, containerd), and virtualization stacks (Qemu, KVM, Libvirt on x86, ARM)
Responsibilities
- Designing, implementing, and optimizing virtualized compute offerings across a range of hardware types
- Developing, implementing, and debugging core execution environment components, including designing secure VMs and container runtime solutions tailored to Apple's unique workloads
- Working on reliability, scalability, resilience, security, and performance limits of infrastructure services, while maintaining curiosity about system operation and failure
- Collaborating with Software and Hardware teams to tackle and resolve complex issues across virtualization, and networking layers, ensuring robust performance, stability, and security
- Developing benchmarks representative of real workloads, analyzing and improving scale, troubleshooting performance efficiency and resilience issues, and fine-tuning performance of low-latency, high-throughput virtualized workloads
- Conducting root cause analysis for on-server system failures and implementing preventive measures
- Collaborating with multi-functional teams across Apple to understand, integrate, and optimize critical workloads into our platform
Other
- Customer-focused thinking and strong problem-solving skills with attention to detail
- Effective communication within a team, experience leading initiatives and collaborating across multidisciplinary teams.
- Willingness to act as a team catalyst to help grow the team and mentor junior engineers
- Enthusiasm about upholding Apple’s standards in product quality, design, and user experience
- Participating in a business-hours rotation for platform issue responses and same-day resolution