The company is looking for a Data Engineer to design, build, and maintain scalable data pipelines and processing systems to support business intelligence, analytics, and machine learning initiatives.
Requirements
- Strong skills in PySpark and SQL (experience with functional programming is a plus).
- Hands-on experience with big data tools like Apache Spark, Kafka, Hadoop, or Hive.
- Proficiency in building ETL pipelines and working with structured and unstructured data.
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services.
- Familiarity with version control systems (e.g., Git), CI/CD, and DevOps practices.
- Solid understanding of data warehousing and data modeling concepts.
- Knowledge of SQL and database systems such as Postgres, Redshift, or Snowflake.
Responsibilities
- Develop and maintain scalable, robust data pipelines using Scala and big data technologies.
- Work with large datasets from multiple sources to ingest, transform, and make data available for analytics and reporting.
- Optimize ETL jobs for performance and cost.
- Ensure data quality, governance, and consistency across all environments.
- Monitor production jobs, troubleshoot issues, and ensure system reliability.
- Implement best practices for data engineering, including code reviews, testing, and documentation.
Other
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- 6+ years of experience as a Data Engineer or in a similar role.
- Collaborate with Data Scientists, Analysts, and other engineers to understand data requirements and deliver efficient solutions.