Build and optimize the core infrastructure that supports AI research, enabling researchers to better understand and improve their models.
Requirements
- Experience with data pipeline development and ETL processes
- Possess strong systems programming skills and understand performance optimization
- Strong software engineering skills in python
- Experience writing and optimizing custom GPU kernels
- Have contributed to observability, benchmarking, or performance-focused infrastructure at scale
- Familiarity of AI/ML workloads
Responsibilities
- Design and implement high-performance data pipelines for processing large-scale datasets with an emphasis on reliability and reproducibility
- Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
- Implement custom GPU kernels
- Create observability and benchmarking tools to help researchers understand the performance of their models/training runs
- Build and maintain secure sandboxed execution environments
Other
- Comfortable working in ambiguous, fast-evolving environments and collaborating across disciplines