Seamless.AI is seeking a Principal Data Engineer to design, develop, and maintain robust and scalable ETL pipelines to acquire, transform, and load data from various sources into their data ecosystem.
Requirements
- Strong proficiency in Python and experience with related libraries and frameworks (e.g., pandas, NumPy, PySpark).
- Hands-on experience with AWS Glue or similar ETL tools and technologies.
- Solid understanding of data modeling, data warehousing, and data architecture principles.
- Expertise in working with large data sets, data lakes, and distributed computing frameworks.
- Experience developing and training machine learning models.
- Strong proficiency in SQL.
- Familiarity with data matching, deduplication, and aggregation methodologies.
Responsibilities
- Design, develop, and maintain robust and scalable ETL pipelines to acquire, transform, and load data from various sources into our data ecosystem.
- Implement data transformation logic using Python and other relevant programming languages and frameworks.
- Utilize AWS Glue or similar tools to create and manage ETL jobs, workflows, and data catalogs.
- Optimize and tune ETL processes for improved performance and scalability, particularly with large data sets.
- Apply methodologies and techniques for data matching, deduplication, and aggregation to ensure data accuracy and quality.
- Implement and maintain data governance practices to ensure compliance, data security, and privacy.
- Collaborate with the data engineering team to explore and adopt new technologies and tools that enhance the efficiency and effectiveness of data processing.
Other
- Bachelor's degree in Computer Science, Information Systems, related fields or equivalent years of work experience.
- 7+ years of experience as a Data Engineer, with a focus on ETL processes and data integration.
- Professional experience with Spark and AWS pipeline development required.
- Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
- Highly organized and self-motivated, with the ability to manage multiple projects and priorities simultaneously.
- Applicants must be authorized to work in the U.S.
- Visa sponsorship is not available.