NVIDIA is looking to build innovative software to make LLM inference more efficient, scalable, and accessible.
Requirements
- Strong coding skills in Python and C/C++.
- Knowledgeable and passionate about machine learning and performance engineering.
- Proven project experiences in building software where performance is one of its core offerings.
- Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.
- Research experience in systems or machine learning.
- Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
- Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
Responsibilities
- Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
- Perform benchmarking, profiling, and system-level programming for GPU applications.
- Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
- Conduct unit tests and performance tests for different stages of the inference pipeline.
Other
- 2+ years of industry experience in software engineering or equivalent research experience.
- We strongly encourage you to include sample projects (e.g. Github) that demonstrate the qualifications above.
- Applications for this job will be accepted at least until September 28, 2025.
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.
- As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.