NVIDIA is looking for AI Software Engineers to expand the capabilities of their GenAI Frameworks (Megatron Core and NeMo Framework), which are open-source, scalable, and cloud-native frameworks for developing, training, and optimizing Large Language Models (LLM) and Multimodal (MM) foundation models.
Requirements
- Experience with AI Frameworks (e.g. PyTorch, JAX), and/or inference and deployment environments (e.g. TRTLLM, vLLM, SGLang).
- Proficient in Python programming, software design, debugging, performance analysis, test design and documentation.
- Strong understanding of AI/Deep-Learning fundamentals and their practical applications.
- Hands-on experience in large-scale AI training, with a deep understanding of core compute system concepts (such as latency/throughput bottlenecks, pipelining, and multiprocessing) and demonstrated excellence in related performance analysis and tuning.
- Expertise in distributed computing, model parallelism, and mixed precision training
- Prior experience with Generative AI techniques applied to LLM and Multi-Modal learning (Text, Image, and Video).
- Knowledge of GPU/CPU architecture and related numerical software.
Responsibilities
- Design and develop the GenAI open source Megatron Core and NeMo Framework
- Solve large-scale, end-to-end AI training and inference challenges, spanning the full model lifecycle from initial orchestration, data pre-processing, and running of model training and tuning, to model deployment.
- Work at the intersection of AI applications, libraries, frameworks, and the entire software stack.
- Innovate and improve model architectures, distributed training algorithms, and model parallel paradigms.
- Accelerate foundation model training and finetuning with mixed precision recipes and next-gen NVIDIA GPU architectures.
- Performance tuning and optimizations of deep learning framework and software components.
- Research, prototype, and develop robust and scalable AI tools and pipelines.
Other
- MS, PhD or equivalent experience in Computer Science, AI, Applied Math, or related fields and 5+ years of industry experience.
- Consistent record of working effectively across multiple engineering initiatives and improving AI libraries with new innovations.
- If you're creative and autonomous, we want to hear from you!
- Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
- Applications for this job will be accepted at least until October 3, 2025.