Capgemini is looking for an Azure Databricks engineer to build and support non-interactive and real-time data pipelines, and technology capabilities
Requirements
- Strong/expert Spark (PySpark) Using Jupyter Notebooks, Colab or DataBricks (preferred)
- Hands-on data pipeline development, ingest patterns in Azure
- Orchestration tools, ADF or Airflow
- SQL
- Denormalized Data modeling for big data systems
Responsibilities
- Demonstrate deep knowledge of the data engineering domain to build and support non-interactive (batch, distributed) & real-time, highly available data, data pipeline, and technology capabilities
- Build fault-tolerant, self-healing, adaptive, and highly accurate data computational pipelines
- Provide consultation and lead the implementation of complex programs
- Develop and maintain documentation relating to all assigned systems and projects
- Tune queries running over billions of rows of data running in a distributed query engine
- Perform root cause analysis to identify permanent resolutions to software or business process issues
Other
- Bachelor’s degree in computer science, management information systems, or related discipline, or equivalent work experience
- Applicants for employment in the US must have valid work authorization that does not now and/or will not in the future require sponsorship of a visa for employment authorization in the US by Capgemini
- Flexible work
- Healthcare including dental, vision, mental health, and well-being programs
- Paid time off and paid holidays