Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Deep Learning Software Engineer, Inference

NVIDIA

$148,000 - $287,500

Sep 6, 2025

Santa Clara, CA, US

NVIDIA seeks to improve the performance and efficiency of deep learning inference for AI applications by designing, building, and optimizing GPU-accelerated software

Requirements

Excellent C/C++ programming and software design skills
Python experience is a plus
Prior experience with training, deploying or optimizing the inference of DL models in production is a plus
Prior background with performance modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus
GPU programming experience (CUDA, OAI TRITON or CUTLASS) is a plus
Experience with Multi GPU Communications (NCCL, NVSHMEM) is a plus
Experience with deep learning frameworks like PyTorch, vLLM, and SGLang is a plus

Responsibilities

Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI
Scale performance of DL models across different architectures and types of NVIDIA accelerators
Contribute features and code to NVIDIA’s inference libraries, vLLM and SGLang, FlashInfer and LLM software solutions
Work with cross-collaborative teams across frameworks, NVIDIA libraries and inference optimization innovative solutions
Implement the latest algorithms for public release in frameworks like SGLang and vLLM
Identify and drive performance improvements for state-of-the-art LLM and Generative AI models across NVIDIA accelerators
Implement and optimize model serving pipelines using open-source tools and plugins

Other

Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI)
5+ years of relevant software development experience
SW Agile skills are helpful
Travel requirements not specified
Degree requirements: Masters or PhD or equivalent experience