80% of clean energy projects that developers start never actually get built because most projects are started without deep due diligence on zoning and interconnection due to the cost of collecting that data. This means $17B worth of canceled projects per year.
Requirements
- 1+ hands on professional/internship experience working on data infra
- Solid understanding of data engineering concepts, with a strong grasp of data structures, algorithms, and system design
- Strong coding skills with demonstrated proficiency in relevant programming languages, such as Python, Rust, Scala
- Advanced SQL expertise, including experience with complex queries, query optimization, and working with various database systems
- Hands-on experience with big data tools (e.g. Spark) and data pipeline orchestration tools (e.g. Dagster, Airflow, Prefect)
- Proven experience in building robust, scalable and performant data pipelines on the cloud (AWS / GCP / Azure)
- Previous experience working with (geo)spatial datasets and libraries (e.g. GEOS, GDAL)
- Hands-on experience with novel data tools and frameworks such as Apache Arrow, DuckDB, DeltaLake, Apache Iceberg
Responsibilities
- Design, implement, and maintain scalable ETL data pipelines from hundreds of data sources
- Optimize our storage and retrieval systems for performance and reliability
- Ensure data quality, consistency, and security across the platform
- Collaborate closely with our CTO, Data Infra Lead, Product Lead, and team to directly impact product roadmap
Other
- Hybrid work in the office in Williamsburg, Brooklyn ~3x per week
- Have a strong bias towards action and prioritize execution
- Share our passion to build something that fights climate change
- Easily handle the unstructured environment of fast moving startups
- Have the hunger to grow together with Paces as we scale up