Apple's Foundation Model Infrastructure team is seeking to build frameworks, services, and tools to power large foundation models on servers, optimizing inference for cutting-edge model architectures and launching production-grade solutions for millions of customers in real-time.
Requirements
- Have experience with high throughput services particularly at supercomputing scale.
- Proficient with running applications on Cloud (AWS / Azure or equivalent) using Kubernetes, Docker etc.
- Familiar with GPU programming concepts using CUDA.
- Familiar with one of the popular ML Frameworks like Pytorch, Tensorflow.
- Proficient in building and maintaining systems written in modern languages (eg: Golang, python)
- Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models.
- Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc.
Responsibilities
- Work along side Foundation Model Research team to optimize inference for cutting edge model architectures.
- Work closely with product teams to build Production grade solutions to launch models serving millions of customers in real time.
- Build tools to understand bottlenecks in Inference for different hardwares and use cases.
- Mentor and guide engineers in the organization.
Other
- 5+ years of experience leading and driving complex, ambiguous projects.
- Mentor and guide engineers in the organization.