Pyx Health is looking for a Senior Data Engineer to maintain and evolve their data infrastructure on Azure, specifically focusing on data pipelines from ingestion to analytics delivery.
Requirements
- Deep expertise in Databricks, including Delta Lake optimization (ZORDER, vacuuming, partitioning).
- Strong Python skills for data engineering workflows.
- Proficiency in Postgres (our primary transactional database).
- Hands-on experience with Airflow—building and maintaining industry-grade DAGs.
- Hands-on experience with dbt for transformation and data modeling.
- Solid understanding of medallion architecture principles.
- Experience with Unity Catalog or comparable data governance tooling.
Responsibilities
- Design, build, and maintain batch data pipelines using Airflow (Astronomer) and dbt, ingesting data from Postgres, Salesforce, other business critical cloud-based SaaS applications, flat files, and other internal transactional tools.
- Develop and optimize data models within a medallion architecture (Bronze/Silver/Gold) on Delta Lake.
- Write production-grade Python for custom extractors, transformations, and pipeline logic.
- Implement and enforce data governance using Unity Catalog across multi-tenant schemas.
- Strengthen CI/CD practices for data—automated testing, environment promotion, and deployment pipelines via GitHub.
- Monitor pipeline health and data quality using Datadog; proactively resolve issues.
- Optimize Databricks compute costs through cluster policies, spot instances, and query tuning.
Other
- ONLY CANDIDATES RESIDING IN THE USA MAY APPLY.
- Minimum 5 years of experience as a Data Engineer.
- Can start, run, manage, and complete a technical project with minimal oversight.
- Strong root cause analysis skills.
- Communicates effectively with cross-functional teams and stakeholders.