NVIDIA is looking to solve the problem of optimizing at-scale AI system performance and datacenter applications by providing insights on system design and tuning mechanisms for large-scale compute runs, and crafting improved workflows and developing new, leading differentiated solutions.
Requirements
- 5+ years of experience running multinode workloads and identifying bottlenecks and implementing improvements.
- Proven understanding of high-performance computing based architectures and GPU accelerated computing software stacks and DL Frameworks (CUDA, PyTorch).
- Experience with CPU architectures.
- Experience with C/C++/Python/Bash programming/scripting.
- Experience tuning memory, storage, and networking settings for performance on Linux systems.
- Knowledge of modern Cloud and container-based architectures.
- Hands-on experience deploying and debugging systems with NVIDIA NVLink and Infiniband.
Responsibilities
- Provide engineering solutions to enable deployment of world-class GPU computing products at scale, lead technical relationships with engineering teams, and assisting system administrators, software and hardware engineers, and machine learning/deep learning engineers in building creative solutions.
- Lead aspects of performance analysis and scalable practices to support large scale infrastructure, deliver powerful tools, methodologies, and workflows to validate expectations.
- Deliver engineering solutions to deliver continuous insights into performance of AI workloads over evolving environments, generating quick insights to improvements and regressions over time.
- Decompose multi-faceted issues into minimal reproduction cases, working towards final root cause of underlying problems.
- Participate and engage with multiple team members to develop best practices for understanding trends in test results and presenting data clearly to develop data driven actions.
Other
- Strong teamwork and communication skills.
- Ability to multitask in a dynamic environment.
- Action driven with strong analytical and analytical skills.
- BS in Engineering, Mathematics, Physics, or Computer Science, MS or PhD desirable (or equivalent experience).
- If you're creative and autonomous, with a genuine passion for technology, we want to hear from you.