Capgemini is seeking a Databricks Data Engineer to support its enterprise-wide Sustainability initiative by building data pipelines and models to support product-level carbon footprint analysis.
Requirements
- Hands-on experience with Databricks (Delta Lake, Unity Catalog, Jobs, Workflows)
- Strong skills in PySpark and SQL
- Experience with SAP MBOM structures, especially multilevel BOM explosion
- Understanding of Oracle-based EBOM systems and ability to integrate structured data
- Familiarity with AWS data ecosystem (S3, Glue, Lambda, Athena)
- Strong knowledge of data modeling, pipeline optimization, and performance tuning
Responsibilities
- Develop and optimize ETL/ELT pipelines on Databricks using PySpark and SQL to support carbon footprint analytics.
- Build data models that combine engineering (EBOM), manufacturing (MBOM), supplier, and factory operations data to generate emissions metrics.
- Integrate and transform data from MBOM from SAP, including multilevel BOM explosion logic
- Integrate and transform data from EBOM from Oracle-based systems (no explosion required)
- Integrate and transform data from Supplier environmental data (e.g., material-level emissions)
- Integrate and transform data from Factory data (e.g., energy consumption, material usage)
- Collaborate with sustainability analysts, engineering teams, and supply chain stakeholders to translate carbon calculation logic into data transformations.
Other
- Applicants for employment in the US must have valid work authorization that does not now and/or will not in the future require sponsorship of a visa for employment authorization in the US by Capgemini.
- Must be eligible to work in the US
- Bachelor's degree or equivalent experience
- Ability to work in a team environment
- Strong communication and problem-solving skills