NVIDIA is looking to optimize all layers of the CUDA ecosystem to achieve class-leading speedups in modern high-performance workloads and models
Requirements
- Strong knowledge of compilers, code generation, and GPU architecture
- Experience with GPU programming and performance optimization (CUDA or equivalent)
- Extensive Python programming skills, along with software engineering fundamentals
- Basic programming skills in other languages such as C/C++, Racket and Rust
- Strong mathematical and scientific foundation relevant to optimization heuristics/algorithms, ML and data science
- Familiarity with genetic/evolutionary algorithms, predictive modeling, and complex systems
- Hands-on experience building compilers or compiler components using the LLVM framework, including optimization passes and code generation
Responsibilities
- Design and build high-performance optimization frameworks for the entire CUDA ecosystem
- Co-design novel solutions with software, hardware and algorithm teams; influence and adopt new capabilities as they become available
- Develop reproducible, high-fidelity evaluation frameworks covering performance, quality and developer productivity
- Collaborate across the AI stack — from hardware through compilers/toolchains, kernels/libraries, frameworks, distributed training, and inference/serving
Other
- Bachelor’s degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); MS or PhD preferred
- 6+ years of industry or academia experience with software engineering, compilers and developer tools
- Track record developing and productizing software, optimization frameworks and/or developer tooling
- Ability to work in a diverse, supportive environment
- Commitment to fostering a diverse work environment and equal opportunity employment