NVIDIA is looking to develop groundbreaking technologies in the inference systems software stack to accelerate AI inference and define the next era of computing
Requirements
- Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc)
- Strong Python and C/C++ programming skills
- Background in domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)
- Expertise in inference engines like vLLM and SGLang
- Expertise in machine learning compilers (e.g. Apache TVM, MLIR)
- Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)
Responsibilities
- Innovating and developing new AI systems technologies for efficient inference
- Designing, implementing, and optimizing kernels for high impact AI workloads
- Designing and implementing extensible abstractions for LLM serving engines
- Building efficient just-in-time domain specific compilers and runtimes
- Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams
- Contributing to open source communities like FlashInfer, vLLM, and SGLang
Other
- Bachelor's degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD are preferred
- Travel requirements not specified
- Must be eligible to work in the US
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer
- Base salary will be determined based on location, experience, and the pay of employees in similar positions