NVIDIA seeks to design, build, and optimize GPU-accelerated software for AI applications, focusing on high-performance deep learning frameworks like SGLang and vLLM for efficient large-scale model serving and inference.
Requirements
- excellent C/C++ programming and software design skills.
- GPU programming experience (CUDA, OAI TRITON or CUTLASS) is a plus.
- Prior background with performance modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
- Prior experience with training, deploying or optimizing the inference of DL models in production is a plus.
- Experience with Multi GPU Communications (NCCL, NVSHMEM)
Responsibilities
- Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI.
- Scale performance of DL models across different architectures and types of NVIDIA accelerators.
- Contribute features and code to NVIDIA’s inference libraries, vLLM and SGLang, FlashInfer and LLM software solutions.
- Work with cross-collaborative teams across frameworks, NVIDIA libraries and inference optimization innovative solutions.
Other
- Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI).
- 5+ years of relevant software development experience.
- SW Agile skills are helpful and Python experience is a plus.
- creative and autonomous engineer with a genuine passion for technology