NVIDIA is looking to solve the problem of advancing post-training algorithms, building efficient large-scale systems, and developing evaluation frameworks for large-scale generative AI, specifically LLMs and DLMs.
Requirements
- PhD in Computer Science, Electrical Engineering, or related field, or equivalent research experience in LLMs, systems, or related areas.
- 2+ years of experiences in machine learning, systems, distributed computing, or large-scale model training.
- Proficiency in Python with hands-on experience in frameworks such as PyTorch.
- Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming.
- Expertise in post-training LLMs with novel algorithmic/data pipelines
- Experience developing and scaling large distributed systems for deep learning.
- Contributions to open-source LLM systems or large-scale AI infrastructure.
Responsibilities
- Designing and implementing post-training algorithms LLMs and DLMs.
- Driving efficiency and scalability improvements across training pipelines and serving systems
- Collaborating with researchers to translate cutting-edge ideas into production-ready implementations.
- Exploring new paradigms for evaluation.
- Demonstrating strong engineering practices, and contributing to open-source communities.
Other
- Proven ability to collaborate across research and engineering teams in multifaceted environments.
- Are you creative and autonomous?
- Do you love a challenge?
- Applications for this job will be accepted at least until October 10, 2025.
- NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.