Apple is looking for a Staff Software Engineer to design, build, and maintain the compute infrastructure for machine learning, artificial intelligence, and computer vision applications, enabling model training and tuning.
Requirements
- Experience with public cloud infrastructure like: Kubernetes, Amazon EC2 and EKS, Google Cloud Platform
- Strong software development skills, with proficiency in relevant languages (ex. Golang, Python)
- Strong problem solving skills and ability to write performant and high-quality code
- Proficient understanding of the software development process, including unit testing and release management
- Strong understanding of batch scheduling systems and high-performance computing environments
- Experience using system monitoring tools, automated testing frameworks and CI/CD pipelines
- Experience with GPUs and/or other ML accelerators in the context of Machine Learning
Responsibilities
- Own the architecture, design, development, and operations of large-scale systems designed for machine learning.
- Develop custom scheduling, resource management solutions, and fleet management for our ML model training compute infrastructure.
- Collaborate with multi-functional teams, integrate with Kubernetes in on-premises and cloud provider clusters, and enable seamless integration with NVIDIA GPUs and other ML accelerators.
- Partner with data scientists and machine learning engineers across different Apple organizations to define high-impact product features and deliver them with quality.
- Lead a group of engineers to deliver high-quality products/services.
- Be able to stay on top of innovative technologies and apply them in the job.
- Support junior engineers by providing advice, mentoring, and educational opportunities.
Other
- 10+ years of industry related experience, working in collaborate environments
- Excellent interpersonal skills; able to work independently as well as in a team; can take feedback and iterate on a solution in a collaborative setting
- A passion for making simple, robust, and scalable platforms used by other engineering teams
- Flexibility/adaptability for working in a dynamic environment with different frameworks and requirements
- Bachelors in Computer Science