The company is looking to design, build, and maintain scalable data pipelines to process and transform large datasets, optimize ELT/ETL processes, and ensure data accuracy and consistency.
Requirements
- Proficient in any programming languages commonly used in data engineering such as Python or Scala
- Experience of ‘big data’ platforms such as Hadoop, Hive or Snowflake for data storage and processing
- Understanding of Data Warehousing concepts, Relational Oracle database design
- Good exposure to data modeling techniques; design, optimization and maintenance of data models and data structures
- Exposure to concepts and enablers - CI/CD platforms, version control systems (e.g. GIT), automated quality control management
- Exposure to data validation, cleansing, enrichment and data controls
- Proficiency in data integration platforms such as Apache Spark or Talend
Responsibilities
- Design , build and maintain scalable pipelines using Python/Databricks
- Leverage Spark and SQL to process and transform large scale datasets
- Develop and optimize ELT/ETL processes for high volume of data workflows
- Hands on experience of building data pipelines. Proficiency in data integration platforms such as Apache Spark or Talend
- Design, develop, and maintain ETL processes to extract, transform, and load data from various sources into our data warehouse.
- Write complex SQL queries and PL/SQL scripts to perform data manipulation, validation, and transformation.
- Develop and maintain data pipelines using Python and related libraries.
Other
- BACHELOR OF COMPUTER SCIENCE
- Collaborate with data analysts and other stakeholders to understand data requirements and develop solutions to meet their needs.
- Troubleshoot and resolve data-related issues.
- Create and maintain technical documentation for ETL processes, data pipelines, and database solutions.
- Stay up-to-date with the latest trends and technologies in data management and analytics.