NVIDIA is seeking a Senior Software Engineer to discover and innovate new low-precision and sparsity recipes in the pretraining setting for Large Language Models (LLMs) to unlock efficiency gains. The role focuses on developing next-generation software to leverage novel hardware features on GPUs for LLM optimization across pre-training, post-training, and generation phases.
Requirements
- Proficient in Python
- Experience with PyTorch or similar framework
- Strong Software Engineering background with a focus on building concise and well-tested code
- Experience working with ML accelerators, performance optimization and debugging
- Proficient in precision and numerics for ML
- Familiarity with C++ and CUDA
- Strong foundation in LLM pre training, post training, or generation
Responsibilities
- Create well-designed and well-tested SW systems and PoCs to support recipe exploration for research settings
- Analyze and prototype state-of-the-art methods for quantization and sparsity
- Benchmark, profile, and optimize LLM workloads in cluster settings
- Develop new data analysis tools and visualizations to aid in numerics debugging
- Keep developer and researcher productivity and efficiency high by removing obstacles (e.g., slow CI systems, slow training systems)
- Participate in code reviews, address code review feedback if applicable.
Other
- An MS or PhD or equivalent experience in Computer Science or a related field, and 5+ years of relevant software engineering experience.
- Strong written and oral communication skills