Apple is looking to build and optimize foundation models for its various services, aiming to bring intelligence to billions of users and enhance user experience through low-latency, high-throughput inference.
Requirements
- Have experience with high throughput services particularly at supercomputing scale.
- Proficient with running applications on Cloud (AWS / Azure or equivalent) using Kubernetes, Docker etc.
- Familiar with GPU programming concepts using CUDA.
- Familiar with one of the popular ML Frameworks like Pytorch, Tensorflow.
- Proficient in building and maintaining systems written in modern languages (eg: Golang, python)
- Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models.
- Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc.
Responsibilities
- Work along side Foundation Model Research team to optimize inference for cutting edge model architectures.
- Work closely with product teams to build Production grade solutions to launch models serving millions of customers in real time.
- Build tools to understand bottlenecks in Inference for different hardwares and use cases.
- Mentor and guide engineers in the organization.
Other
- 5+ years of experience leading and driving complex, ambiguous projects.