McKinsey is looking to solve complex and pressing challenges for their clients in the chemicals industry by leveraging data engineering to build core data frameworks, design technical backbones for advanced analytics, and create robust data pipelines for machine learning.
Requirements
- Proven experience building data pipelines in production for advanced analytics use cases
- Experience working across structured, semi-structured and unstructured data
- Familiarity with distributed computing frameworks, cloud platforms, containerization, and analytics libraries
- Exposure to software engineering concepts and best practices, including DevOps, DataOps and MLOps preferred
- Python, PySpark, the PyData stack, SQL, Airflow, Databricks, our own open-source data pipelining framework called Kedro, Dask/RAPIDS, container technologies such as Docker and Kubernetes, cloud solutions such as AWS, GCP, and Azure, and more
Responsibilities
- design and build core data frameworks that support our client service organizations
- design and build the technical backbone for advanced analytics engagements
- create robust, scalable, and reproducible data pipelines for machine learning
- curate and prepare data for advanced models
- manage secure data environments
- contribute to R&D projects and help develop innovative internal assets and frameworks
- building technology assets for internal and external clients
Other
- 2-5+ years of relevant experience
- Exceptional time management to meet your responsibilities in a complex and largely autonomous work environment
- Strong communication skills, both verbal and written, in English and local office language(s), with the ability to adjust your style to suit different perspectives and seniority levels
- Undergrad or Advanced degree in a quantitative field like computer science, machine learning, applied statistics or mathematics, or equivalent experience