MCG is looking to enable efficient and effective data ingestion & delivery systems to support their mission of delivering patient-focused care and accelerating improvements in healthcare.
Requirements
- Proficient in designing, building, and maintaining large-scale, reliable data pipeline systems.
- Competence in designing and handling large-scale data pipeline systems.
- Advanced SQL skills for querying and processing data.
- Proficiency in Python, with experience in Spark for data processing.
- 3+ years of experience in data engineering, including data modeling and ETL pipelines.
- Familiarity with cloud-based tools and infrastructure management using Terraform and Kubernetes is a plus.
- Experience with orchestration tools like Flyte
Responsibilities
- Explore, analyze, and onboard data sets from data producers to ensure they are ready for processing and consumption.
- Develop and maintain scalable and efficient data pipelines for data collection, processing (quality checks, de-duplication, etc.), and integration into Data lake and Data warehouse systems.
- Optimize and monitor data pipeline performance to ensure minimal downtime.
- Implement data quality control mechanisms to maintain data set integrity.
- Manage the deployment and automation of pipelines and infrastructure using Terraform, Flyte, and Kubernetes.
- Lead end-to-end data pipeline development — from initial data discovery and ingestion to transformation, modeling, and delivery into production-grade data platforms.
- Integrate and manage data from 3+ distinct sources, designing efficient, reusable frameworks for multi-source data processing and harmonization.
Other
- Demonstrated ability to navigate ambiguous data challenges, ask the right questions, and design effective, scalable solutions.
- Collaborate with stakeholders for seamless data flow and address issues or needs for improvement.
- Support strategic data analysis and operational tasks as needed.
- Remote work
- Travel expected 2-3 times per year for company-sponsored events