NVIDIA is looking to solve the problem of optimizing deep learning operations on NVIDIA GPUs by developing high-performance code for cuDNN, cuBLAS, and TensorRT libraries to accelerate deep learning models.
Requirements
- Demonstrated strong C++ programming and software design skills, including debugging, performance analysis, and test design
- Experience with performance-oriented parallel programming, even if it’s not on GPUs (e.g. with OpenMP or pthreads)
- Solid understanding of computer architecture and some experience with assembly programming
- Tuning BLAS or deep learning library kernel code
- CUDA/OpenCL GPU programming
- Numerical methods and linear algebra
Responsibilities
- Writing highly tuned compute kernels, mostly in C++ CUDA, to perform core deep learning operations (e.g. matrix multiplies, convolutions, normalizations)
- Following general software engineering best practices including support for regression testing and CI/CD flows
- Collaborating with teams across NVIDIA: CUDA compiler team on generating optimal assembly code
- Collaborating with teams across NVIDIA: Deep learning training and inference performance teams on which layers require optimization
- Collaborating with teams across NVIDIA: Hardware and architecture teams on the programming model for new deep learning hardware features
Other
- Masters or PhD degree or equivalent experience in Computer Science, Computer Engineering, Applied Math, or related field
- 6+ years of relevant industry experience
- Travel requirements not specified
- Visa requirements not specified
- Degree requirements: Masters or PhD degree or equivalent experience