The company is looking to build and deploy highly scalable data pipelines to move and transform data, ensuring the security and quality of data.
Requirements
- Building and optimizing large-scale, distributed data pipelines with Hive, Presto, Spark, or Flink
- Data warehousing, including dimensional modeling, star/snowflake schema design, and normalization/denormalization strategies in large-scale data warehouses including Amazon Redshift
- Writing and optimizing complex SQL queries for large datasets, creating joins, aggregations, and subqueries, in the context of querying data warehouses
- Data storage solutions, including S3 on AWS
- Cloud-native services including AWS EMR, AWS S3
- Using workflow orchestration tools including Airflow in a production environment to automate, schedule, monitor and tune, data pipelines
Responsibilities
- Build and deploy highly scalable data pipelines to move and transform data
- Optimize and maintain all domain-related data pipelines
- Implement inclusive data quality checks to ensure high quality of data
- Implement and enforce data security policies and ensure compliance with relevant regulations and standards
- Provide 24x7 on-call support on a rotational basis
Other
- Partner in a cross-functional global organization across data, security, infrastructure, and business teams to understand data needs
- Master’s degree or a foreign equivalent in Applied Data Science, Computer Science, or a related field, plus 1 year of post-baccalaureate experience in job offered or any related job titles