Apple Maps is looking to improve search quality and power experiences across Maps by bringing advanced deep learning and large language models into high-volume, low-latency, highly available production serving
Requirements
- Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
- Proficiency in Python, Java or C++
- Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
- Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
- Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
- Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
- Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment
Responsibilities
- Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving deep learning and large language models across Apple Maps
- Lead system-level optimization efforts across the inference stack, balancing latency, throughput, accuracy, and cost through advanced techniques such as quantization, kernel fusion, speculative decoding, and efficient runtime scheduling
- Design and evolve control-plane services responsible for model lifecycle management, including deployment orchestration, versioning, traffic routing, rollout strategies, capacity planning, and failure handling in production environments
- Drive adoption of platform abstractions and standards that enable partner teams to onboard, deploy, and operate models reliably and efficiently at scale
- Partner closely with research, product, and infrastructure teams to translate model requirements into production-ready systems, providing technical guidance and feedback to influence upstream model design
- Optimize inference execution across heterogeneous compute environments, including GPUs and specialized accelerators, collaborating with runtime, compiler, and kernel teams to maximize hardware utilization
- Establish robust observability and performance diagnostics, defining metrics, dashboards, and profiling workflows to proactively identify bottlenecks and guide optimization decisions
Other
- Master’s or PhD in Computer Science, Machine Learning, or a related field (Preferred Qualification)
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience) (Minimum Qualification)
- 5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems
- Provide technical leadership and mentorship, reviewing designs, setting engineering best practices, and raising the quality bar across teams contributing to the inference ecosystem
- Apple is an equal opportunity employer that is committed to inclusion and diversity