Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Qualcomm Logo

LLM Serving Engineer - Cloud AI Engineering - Machine Learning Engineering

Qualcomm

$158,400 - $237,600
Sep 30, 2025
San Diego, CA, USA
Apply Now

Qualcomm is investing in Deep Learning and developing hardware and software solutions for Inference Acceleration to play a central role in the evolution of Cloud AI.

Requirements

  • Hands-on experience in one or more of thefollowing LLM serving/Orchestration packages (Triton-Inference Server, vLLM, SGLang, Ollama, llm-d, KServe, LMCache, MoonCake)
  • Deep understanding of foundational LLMs, VLMs, SLMs, transformer-basedarchitectures.
  • Strong experience in developinglanguage models using PyTorch.
  • Strong computer science fundamentals - algorithms, data structures, parallel and distributed programming.
  • Understanding of computer architecture,ML accelerators,in-memory processing anddistributed systems.
  • Strong Python development skills for large-scale projects with passion for software engineering.
  • Experience in analyzing, profiling, and optimizing deep learning workloads.

Responsibilities

  • Building a scalable LLM inference platform using inference techniques (e.g.disaggregated serving and KV-Cache management,advanced parallelism,speculative algorithms, model optimization, specialized kernels).
  • Contribute to the development of LLM Servingpackages (e.g.vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d).
  • Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams.
  • Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities.
  • Drive efficient serving through smart autoscaling, load balancing androuting.
  • Engage with open-sourceserving communitiesto evolvethe framework.

Other

  • Excellent communication and problem-solving skills, with the ability to thrive in afast-pacedand collaborative environment.
  • MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering.
  • Open-source contribution to any GenAI package.
  • Experience architecting and developing large-scale distributed systems.
  • High-level kernel design experience (PyTorch, CUDA, Triton).