Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Apple Logo

Senior Software Engineer, Model Inference

Apple

Salary not specified
Dec 25, 2025
Remote, US
Apply Now

Apple Maps is looking to improve search quality and power experiences across Maps by bringing advanced deep learning and large language models into high-volume, low-latency, highly available production serving

Requirements

  • Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
  • Proficiency in Python, Java or C++
  • Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
  • Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
  • Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
  • Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
  • Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment

Responsibilities

  • Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving deep learning and large language models across Apple Maps
  • Lead system-level optimization efforts across the inference stack, balancing latency, throughput, accuracy, and cost through advanced techniques such as quantization, kernel fusion, speculative decoding, and efficient runtime scheduling
  • Design and evolve control-plane services responsible for model lifecycle management, including deployment orchestration, versioning, traffic routing, rollout strategies, capacity planning, and failure handling in production environments
  • Drive adoption of platform abstractions and standards that enable partner teams to onboard, deploy, and operate models reliably and efficiently at scale
  • Partner closely with research, product, and infrastructure teams to translate model requirements into production-ready systems, providing technical guidance and feedback to influence upstream model design
  • Optimize inference execution across heterogeneous compute environments, including GPUs and specialized accelerators, collaborating with runtime, compiler, and kernel teams to maximize hardware utilization
  • Establish robust observability and performance diagnostics, defining metrics, dashboards, and profiling workflows to proactively identify bottlenecks and guide optimization decisions

Other

  • Master’s or PhD in Computer Science, Machine Learning, or a related field (Preferred Qualification)
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience) (Minimum Qualification)
  • 5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems
  • Provide technical leadership and mentorship, reviewing designs, setting engineering best practices, and raising the quality bar across teams contributing to the inference ecosystem
  • Apple is an equal opportunity employer that is committed to inclusion and diversity