At CVS Health, the business problem is to design and implement data pipelines that power analytical capabilities, requiring a Sr. Data Engineer to translate business requirements into technical solutions.
Requirements
- Proficiency in Python, specifically with ETL pipelines.
- Strong proficiency in SQL and experience in developing complex queries.
- Familiarity with pySpark, DBT, or other similar frameworks.
- Experience deploying data pipelines in a cloud environment (Azure, AWS, GCP).
- Understanding of data warehousing concepts, dimensional modeling, and building data marts.
- Knowledge of data governance best practices in a cloud environment.
- Experience with machine learning flows on GCP.
Responsibilities
- Data Pipeline Development: Design and build ETL/ELT data pipelines to ingest, process, and transform large datasets from multiple sources.
- Performance Optimization: Implement best practices for performance tuning, partitioning, and clustering to optimize data queries and reduce costs.
- Data Quality & Governance: Establish and enforce data quality standards, data governance frameworks, and security policies for data storage and access.
- Data Modeling & Architecture: Develop and optimize data models and schemas to support analytics, reporting, and machine learning requirements.
- Data Integration & Transformation: Collaborate with data scientists and analysts to design data solutions that integrate with BI tools and machine learning models.
- Documentation & Knowledge Sharing: Create comprehensive documentation for data pipelines, workflows, and processes. Share best practices and mentor junior data engineers.
- Design and architect data infrastructure analytical workloads.
Other
- College degree or certification in related fields
- 5+ years of applicable work experience
- Excellent communication and interpersonal skills, with the ability to collaborate effectively with data scientists, analysts, and product owners.
- 40 Anticipated Weekly Hours
- Full time