NVIDIA is looking to optimize generative AI models for maximal inference efficiency and develop an innovative software platform for automated deployment.
Requirements
- Strong proficiency in Python, PyTorch, and related ML tools (e.g. HuggingFace).
- Strong algorithms and programming fundamentals.
- Contributions to PyTorch, JAX, or other Machine Learning Frameworks.
- Knowledge of GPU architecture and compilation stack, and capability of understanding and debugging end-to-end performance.
- Familiarity with NVIDIA's deep learning SDKs such as TensorRT.
- Prior experience in writing high-performance GPU kernels for machine learning workloads in frameworks such as CUDA, CUTLASS, or Triton.
Responsibilities
- Train, develop, and deploy state-of-the generative AI models like LLMs and diffusion models using NVIDIA's AI software stack.
- Leverage and build upon the torch 2.0 ecosystem (TorchDynamo, torch.export, torch.compile, etc...) to analyze and extract standardized model graph representation from arbitrary torch models for our automated deployment solution.
- Develop high-performance optimization techniques for inference, such as automated model sharding techniques (e.g. tensor parallelism, sequence parallelism), efficient attention kernels with kv-caching, and more.
- Collaborate with teams across NVIDIA to use performant kernel implementations within our automated deployment solution.
- Analyze and profile GPU kernel-level performance to identify hardware and software optimization opportunities.
- Continuously innovate on the inference performance to ensure NVIDIA's inference software solutions (TRT, TRT-LLM, TRT Model Optimizer) can maintain and increase its leadership in the market.
- Play a pivotal role in architecting and designing a modular and scalable software platform to provide an excellent user experience with broad model support and optimization techniques to increase adoption.
Other
- Masters, PhD, or equivalent experience in Computer Science, AI, Applied Math, or related field.
- 3+ years of relevant work or research experience in Deep Learning.
- Excellent software design skills, including debugging, performance analysis, and test design.
- Good written and verbal communication skills and the ability to work independently and collaboratively in a fast-paced environment.
- Applications for this job will be accepted at least until October 20, 2025.