Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Deep Learning Software Engineer, Inference

NVIDIA

$148,000 - $287,500

Sep 5, 2025

Santa Clara, CA, US

NVIDIA seeks to design, build, and optimize GPU-accelerated software for AI applications, focusing on high-performance deep learning frameworks like SGLang and vLLM for efficient large-scale model serving and inference.

Requirements

excellent C/C++ programming and software design skills.
GPU programming experience (CUDA, OAI TRITON or CUTLASS) is a plus.
Prior background with performance modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
Prior experience with training, deploying or optimizing the inference of DL models in production is a plus.
Experience with Multi GPU Communications (NCCL, NVSHMEM)

Responsibilities

Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI.
Scale performance of DL models across different architectures and types of NVIDIA accelerators.
Contribute features and code to NVIDIA’s inference libraries, vLLM and SGLang, FlashInfer and LLM software solutions.
Work with cross-collaborative teams across frameworks, NVIDIA libraries and inference optimization innovative solutions.

Other

Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI).
5+ years of relevant software development experience.
SW Agile skills are helpful and Python experience is a plus.
creative and autonomous engineer with a genuine passion for technology