Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Software Engineer, Model Inference

Apple

Salary not specified

Dec 25, 2025

Remote, US

Apple Maps is looking to improve search quality and power experiences across Maps by bringing advanced deep learning and large language models into high-volume, low-latency, highly available production serving

Requirements

Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
Proficiency in Python, Java or C++
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment

Responsibilities

Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving deep learning and large language models across Apple Maps
Lead system-level optimization efforts across the inference stack, balancing latency, throughput, accuracy, and cost through advanced techniques such as quantization, kernel fusion, speculative decoding, and efficient runtime scheduling
Design and evolve control-plane services responsible for model lifecycle management, including deployment orchestration, versioning, traffic routing, rollout strategies, capacity planning, and failure handling in production environments
Drive adoption of platform abstractions and standards that enable partner teams to onboard, deploy, and operate models reliably and efficiently at scale
Partner closely with research, product, and infrastructure teams to translate model requirements into production-ready systems, providing technical guidance and feedback to influence upstream model design
Optimize inference execution across heterogeneous compute environments, including GPUs and specialized accelerators, collaborating with runtime, compiler, and kernel teams to maximize hardware utilization
Establish robust observability and performance diagnostics, defining metrics, dashboards, and profiling workflows to proactively identify bottlenecks and guide optimization decisions

Other

Master’s or PhD in Computer Science, Machine Learning, or a related field (Preferred Qualification)
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience) (Minimum Qualification)
5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems
Provide technical leadership and mentorship, reviewing designs, setting engineering best practices, and raising the quality bar across teams contributing to the inference ecosystem
Apple is an equal opportunity employer that is committed to inclusion and diversity