Apple is looking to optimize the end-to-end system performance of distributed machine learning workloads to enable the next generation of intelligent experiences on Apple products and services.
Requirements
- Experience working with large scale parallel and distributed accelerator-based systems
- Experience optimizing performance and AI workloads at scale
- Experience developing code in one or more of training frameworks (such as PyTorch, TensorFlow or JAX)
- Programming and software design skills (proficiency in C/C++ and/or Python)
- Deep understanding of computer systems and the interactions between HW and SW
- Experience in performance analysis and optimization experience in Cloud accelerators
Responsibilities
- Engage with ML researchers to optimize end-to-end performance of large scale distributed ML workloads
- Analyze workload metrics to identify sources of inefficiencies and work with users to understand and optimize ML workloads
- Conduct workload analysis based on benchmarking key workloads on deployed systems
- Improve large scale training resiliency by optimizing applications and frameworks for improved recovery from failures and preemptions
- Influence architecture, design, development, and operations of next generation ML accelerator systems based on workload insights
Other
- Strong communicator with ability to analyze complex and ambiguous problems
- Experience working in a high-level collaborative environment and promoting a teamwork mentality
- Bachelor's degree in Computer Science and 7+ years of work experience
- Advanced degree in CS
- Relocation