NVIDIA is seeking to solve some of the world’s most challenging problems by driving advancements in AI and machine learning, specifically by developing the industry-leading deep learning inference software for NVIDIA AI accelerators.
Requirements
- Strong proficiency in C++ (required), Rust or Python programming languages.
- Experience in developing Deep Learning Frameworks, Compilers, or System Software.
- Experience in developing inference backends and compilers for GPUs.
- Knowledge of Machine Learning techniques and GPU programming with CUDA or OpenCL.
- Background in working with LLM inference frameworks like TensorRT-LLM, vLLM, SGLang.
- Experience working with deep learning frameworks like TensorRT, PyTorch, JAX.
- Knowledge of close-to-metal performance analysis, optimization techniques, and tools.
Responsibilities
- Design, develop and optimize NVIDIA TensorRT and TensorRT-LLM to supercharge inference applications for datacenter, workstations, and PCs.
- Develop software in C++, Python, and CUDA for seamless and efficient deployment of state-of-the-art LLMs and Generative AI models.
- Collaborate with deep learning experts and GPU architects throughout the company to influence Hardware and Software design for inference.
Other
- BS, MS, PhD or equivalent experience in Computer Science, Computer Engineering or a related field.
- 8+ years of software development experience on a large codebase or project.
- Excellent problem-solving skills and passion to learn and work effectively in a fast-paced, collaborative environment.
- Strong communication skills and the ability to articulate complex technical concepts.
- Travel requirements not specified, but must be willing to work in a hybrid environment.