Gridmatic Inc. is looking to accelerate the decarbonization of the electricity system by building and optimizing the backbone of their ML platform.
Requirements
- Solid expertise in machine learning, distributed systems and GPU-based training
- Strong deep learning fundamentals in addition to strong software engineering skills.
- Experienced in researching and implementing deep learning models.
- Experienced in distributed training and inference of large models on GPU clusters, utilizing core libraries and frameworks such as PyTorch, PyTorch Lightning, and Ray.
- Comfortable with large-scale data storage infrastructure and formats, e.g. Zarr, SQL, and feature stores
- End to end proficiency in building, maintaining, and debugging cluster infrastructure, utilizing Kubernetes and Terraform.
- Expertise in identifying performance bottlenecks and designing and writing high-performance code for large-scale ML workloads.
Responsibilities
- Own a significant piece of our ML platform while rapidly building and iterating scalable, robust distributed infrastructure for ML training, inference, and evaluation on large-scale time-series and weather datasets.
- Optimize throughput and cost by supporting model training and deployment across multiple clusters and clouds.
- Improve the efficiency of machine learning models and other workloads by optimizing latency, throughput, and memory consumption.
- Pushing the boundaries of current hardware capabilities through techniques like GPU performance engineering.
- Help define the long-term vision for Gridmatic’s ML platform.
- Play a key role in mentoring junior engineers and interns, contributing to a collaborative, innovative, and growth-oriented team culture.
Other
- 3+ years of experience who is committed to technical excellence.
- A self-starter with a strong sense of independence and ownership, and the capability to engineer large, robust systems from the initial design and conceptualization to productionization.
- A mission-driven individual who is enthusiastic about working toward a renewable grid and diving into the intersection of ML and energy.
- Curiosity and a willingness to learn are must-haves!
- No prior energy experience required