At GXO, the business problem is to design and develop robust and scalable data pipelines on a modern data platform to support data ingestion, transformation, and storage, and to ensure the reliability and performance of the data infrastructure.
Requirements
- Expertise in working with modern data warehousing solutions such as Snowflake, ie, Snowpipe Streaming, Warehouse Optimization, and clustering.
- Python and Advanced SQL: Must be adept at scripting in Python, particularly for data manipulation and integration tasks, and have a solid grasp of advanced SQL techniques for querying, transformation, and performance optimization.
- Data Modeling: Understanding of best practices for data modeling, including star schemas, snowflake schemas, and data normalization techniques.
- ETL/ELT Processes: Experience in designing, building, and optimizing ETL/ELT pipelines to process large datasets using dbt.
- Apache Airflow: Experience in building, deploying, and optimizing DAGs in Airflow or a similar tool.
- GitHub: Experience with version control, branching, and collaboration on GitHub.
- Data Visualization: Knowledge of tools like Superset, Looker or Python visualization libraries (Matplotlib, Seaborn, Plotly…etc)
Responsibilities
- Data Pipeline Design and Development: Lead the design, implementation, and maintenance of data pipelines to support data ingestion, transformation, and storage on GCP and Snowflake.
- Collaboration: Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver data solutions that meet business objectives.
- Platform Optimization: Optimize and enhance the performance, scalability, and reliability of existing data pipelines, ensuring efficient data processing and storage.
- Technology Stack: Stay abreast of industry trends and emerging technologies in data engineering and incorporate them into the data architecture where applicable.
- Quality Assurance: Implement best practices for data quality, validation, and testing to ensure the accuracy and integrity of data throughout the pipeline.
- Documentation: Create and maintain comprehensive documentation for data pipelines, ensuring knowledge transfer and supportability.
Other
- Bachelor’s degree in Computer Science, Data Science,or equivalent related work
- 5+ years of experience in designing and building data pipelines on cloud platforms, preferably GCP and Snowflake.
- Collaboration and Communication: Ability to work closely with data scientists, analysts, and other stakeholders to translate business requirements into technical solutions.
- Strong documentation skills for pipeline design and data flow diagrams.
- Data Privacy: Knowledge of data privacy laws including an understanding of GDPR, CCPA, and other regulations to ensure compliance in data privacy practices.