Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

LLM Serving Engineer - Cloud AI Engineering - Machine Learning Engineering

Qualcomm

$158,400 - $237,600

Sep 30, 2025

San Diego, CA, USA

Qualcomm is investing in Deep Learning and developing hardware and software solutions for Inference Acceleration to play a central role in the evolution of Cloud AI.

Requirements

Hands-on experience in one or more of thefollowing LLM serving/Orchestration packages (Triton-Inference Server, vLLM, SGLang, Ollama, llm-d, KServe, LMCache, MoonCake)
Deep understanding of foundational LLMs, VLMs, SLMs, transformer-basedarchitectures.
Strong experience in developinglanguage models using PyTorch.
Strong computer science fundamentals - algorithms, data structures, parallel and distributed programming.
Understanding of computer architecture,ML accelerators,in-memory processing anddistributed systems.
Strong Python development skills for large-scale projects with passion for software engineering.
Experience in analyzing, profiling, and optimizing deep learning workloads.

Responsibilities

Building a scalable LLM inference platform using inference techniques (e.g.disaggregated serving and KV-Cache management,advanced parallelism,speculative algorithms, model optimization, specialized kernels).
Contribute to the development of LLM Servingpackages (e.g.vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d).
Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams.
Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities.
Drive efficient serving through smart autoscaling, load balancing androuting.
Engage with open-sourceserving communitiesto evolvethe framework.

Other

Excellent communication and problem-solving skills, with the ability to thrive in afast-pacedand collaborative environment.
MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering.
Open-source contribution to any GenAI package.
Experience architecting and developing large-scale distributed systems.
High-level kernel design experience (PyTorch, CUDA, Triton).