The company is seeking a Data Engineer to design and implement robust and efficient data pipelines, ensuring high data quality and contributing to the continuous improvement of data management practices.
Requirements
- Proficiency in Python, including strong understanding of Python programming language and experience with Python libraries and frameworks like Pandas, NumPy, and Django
- Expertise in Apache Airflow, including experience in designing, building, and maintaining data pipelines and knowledge of Airflow's architecture, including DAGs and Operators
- Proficiency in data extraction, transformation, and loading processes
- Strong understanding of SQL and NoSQL databases and proficiency in writing complex queries and applying database optimization techniques
- Experience with data warehousing solutions like Amazon Redshift, Google BigQuery, or Microsoft Azure SQL Data Warehouse
- Knowledge of data modeling and data warehousing (desired)
- Experience with data extraction from various sources, data transformation (cleaning, validating, aggregating, joining, etc.), and loading data into databases or data warehouses
Responsibilities
- Design, develop, and maintain ETL processes using Python and Apache Airflow
- Develop and implement data validation processes to ensure high data quality
- Troubleshoot and resolve issues related to data pipelines
- Optimize data extraction, transformation, and loading (ETL) processes to improve efficiency and performance
- Document and maintain the design and details of data processes and schemas
- Stay updated with the latest industry trends and technologies to ensure data practices remain current
- Collaborate with data analysts and other stakeholders to understand and meet their data requirements
Other
- Strong communication and collaboration skills
- Excellent problem-solving skills
- US Citizenship is required to obtain a federal clearance