Transform the machine learning ecosystem by creating groundbreaking technology through the development of the Lightning Thunder compiler and the PyTorch Lightning stack.
Requirements
- Strong experience with deep learning frameworks like PyTorch, JAX, or TensorFlow
- Expertise in compiler development or optimizations in distributed training and inference workflows
- Hands-on experience in model optimization, with a focus on maximizing performance, efficiency, and scalability in large-scale or distributed training setups
- Experience with CUDA or Triton
- Experience contributing to open-source projects, especially in machine learning or high-performance computing
- Experience collaborating with external partners
Responsibilities
- Develop the Thunder compiler, an open-source project developed in collaboration with one of our strategic partners, using deep experience in PyTorch, JAX, or other deep learning frameworks
- Engage in performance-oriented model optimizations, around distributed training as well as inference
- Develop optimized kernels in CUDA or Triton to target specific use-cases
- Integrate Thunder throughout the PyTorch Lightning ecosystem
- Support the adoption of Thunder across the industry
- Work closely within the Lightning team as a strategic partner
Other
- Passion for engaging with open-source communities, including experience supporting users and advocating for project adoption
- Strong communication and collaboration skills for working within a close-knit, high-impact team environment
- Bachelor's degree in Computer Science, Engineering, or related field
- Preferred Master's or PhD in machine learning and related areas
- Flexible paid time off plus 1 week of winter closure
- Generous paid family leave benefits
- $500 monthly meal reimbursement, including groceries & food delivery services
- $500 one time home office stipend
- $1,000 annual learning & development stipend