Building frameworks, services, and tools to power Apple's largest foundation models on servers and optimize billions of parameter language and vision and speech models using state-of-the-art technologies.
Requirements
- Industry background and experience in ML technologies (LLMs, Machine Learning, NLP, Information Retrieval, Statistics).
- Experience with high-throughput services particularly at supercomputing scale.
- Proficient with running applications on Cloud (AWS / Azure or equivalent) using Kubernetes, Docker etc.
- Proficient in building and maintaining systems written in modern languages (eg: Golang, python)
- Familiar with one of the popular ML Frameworks like Pytorch, Tensorflow.
- Familiar with fundamental Deep Learning architectures such as Transformers, Encoder/Decoder models.
- Familiarity with Nvidia TensorRT-LLM, vLLLM, DeepSpeed, Nvidia Triton Server etc.
Responsibilities
- Work closely with product teams to build production-grade solutions to launch models serving millions of customers in real-time.
- Work alongside Foundation Model Research team to prototype and develop inference for cutting-edge model architectures.
- Build tools to understand bottlenecks in Inference for different hardwares and use cases.
- Mentor and guide engineers in the organization.
Other
- Bachelor’s degree or higher in Computer Science or related technical field.
- 8+ years of experience leading and driving complex, ambiguous projects.
- Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition.