Pareto Operations needs a Senior Data Engineer to build scalable and efficient data pipelines for processing large volumes of healthcare data, including eligibility, claims, payments, and risk adjustment datasets.
Requirements
- Expertise in Python programming and hands-on experience with SQL and PySpark for data processing and analysis.
- Proficiency in Python frameworks and libraries for scientific computing (e.g. Numpy, Pandas, SciPy, Pytorch, Pyarrow).
- Strong understanding of AWS services and experience in deploying data solutions on cloud platforms.
- Experience working with healthcare data, including but not limited to eligibility, claims, payments, and risk adjustment datasets.
- Expertise in modeling data in relational databases (e.g., PostgreSQL, MySQL) and file-based databases, ETL processes and data warehousing concepts.
- In-depth knowledge of AWS services such as S3, Glue, EMR, Athena, and Redshift for data storage and processing.
- Familiarity with relational databases (e.g., PostgreSQL, MySQL) and file-based databases for data modeling and storage.
Responsibilities
- Design, develop, and maintain robust data pipelines using Python and PySpark to process large volumes of healthcare data efficiently in a multitenant analytics platform.
- Collaborate with cross-functional teams to understand data requirements, implement data models, and ensure data integrity throughout the pipeline.
- Optimize data workflows for performance and scalability, considering factors such as data volume, velocity, and variety.
- Implement best practices for data ingestion, transformation, and storage in AWS services such as S3, Glue, EMR, Athena, and Redshift.
- Model data in relational databases (e.g., PostgreSQL, MySQL) and file-based databases to support data processing requirements.
- Design and implement ETL processes using Python and PySpark to extract, transform, and load data from various sources into target databases.
- Troubleshoot and enhance existing ETLs and processing scripts to improve efficiency and reliability of data pipelines.
Other
- Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field.
- Minimum of 5 years of experience in data engineering, with a focus on building and optimizing data pipelines.
- Excellent problem-solving skills and the ability to work independently as well as part of a team.
- Strong communication and collaboration skills to work effectively with cross-functional teams.
- Ability to work in an environment with potential interruptions and manage multiple simultaneous tasks with individual timeframes and priorities.