Optimize Databricks performance and cost for Lakehouse pipelines, ensuring reliable and well-documented datasets with clear SLAs, while integrating data from various sources and supporting business units.
Requirements
- Build and operate Lakehouse pipelines on Databricks (Bronze/Silver/Gold) using Delta Lake, Delta Live Tables (DLT), and/or Jobs.
- Optimize ingestion patterns (Autoloader, CDC, streaming).
- Model data, implement quality checks, and performance optimization.
- Profile and tune Spark/SQL workloads: partitioning, clustering, constraints, liquid clustering.
- Engineer Delta tables for speed and cost: partitioning, Z-Ordering/clustering, constraints, file sizing; manage table health with Auto Optimize, OPTIMIZE, and VACUUM.
- Implement incremental processing (MERGE with Change Data Feed, APPLY CHANGES INTO) with idempotency and exactly-once delivery.
- Integrate data from multiple sources, including real-time field equipment and sensors.
Responsibilities
- Build and operate Lakehouse pipelines on Databricks (Bronze/Silver/Gold) using Delta Lake, Delta Live Tables (DLT), and/or Jobs.
- Optimize ingestion patterns (Autoloader, CDC, streaming).
- Model data, implement quality checks, and performance optimization.
- Profile and tune Spark/SQL workloads: partitioning, clustering, constraints, liquid clustering.
- Engineer Delta tables for speed and cost: partitioning, Z-Ordering/clustering, constraints, file sizing; manage table health with Auto Optimize, OPTIMIZE, and VACUUM.
- Implement incremental processing (MERGE with Change Data Feed, APPLY CHANGES INTO) with idempotency and exactly-once delivery.
- Integrate data from multiple sources, including real-time field equipment and sensors.
Other
- Deliver reliable, well-documented datasets with clear SLAs.
- Design and implement dashboards and reports using Power BI and other visualization tools.
- Collaborate with business units to gather requirements and deliver technical solutions.
- Educate and support stakeholders on data tools and best practices.
- Engage in continuous improvement and adoption of new data management technologies.