Lazard is looking to enhance decision-making across its financial advisory and asset management business lines by developing advanced AI and data science solutions, and staying competitive in a data-driven world.
Requirements
- Strong Python, SQL, and Spark (PySpark) skills, and/or Kafka
- Hands-on experience building ETL/ELT with Prefect (or Airflow), dbt, Spark, and/or Kafka.
- Experience onboarding datasets to cloud data platforms (storage, compute, security, governance).
- Familiarity with Azure/AWS/GCP data services (e.g., S3/ADLS/GCS; Redshift/BigQuery; Glue/ADF).
- Git-based workflows CI/CD and containerization with Docker (Kubernetes a plus).
- Snowflake (Snowpipe, Tasks, Streams) as a complementary warehouse.
- Databricks (Delta formats, workflows, cataloging) or equivalent Spark platforms.
Responsibilities
- Ingest and model data from APIs, files/SFTP, and relational sources; implement layered architectures (raw/clean/serving) using PySpark/SQL and dbt, Python.
- Design and operate pipelines with Prefect (or Airflow), including scheduling, retries, parameterization, SLAs, and well‑documented runbooks.
- Build on cloud data platforms, leveraging S3/ADLS/GCS for storage and a Spark platform (e.g., Databricks or equivalent) for compute; manage jobs, secrets, and access.
- Publish governed data services and manage their lifecycle with Azure API Management (APIM) — authentication/authorization, policies, versioning, quotas, and monitoring.
- Enforce data quality and governance through data contracts, validations/tests, lineage, observability, and proactive alerting.
- Optimize performance and cost via partitioning, clustering, query tuning, job sizing, and workload management.
- Uphold security and compliance (e.g., PII handling, encryption, masking) in line with firm standards.
Other
- Bachelor’s or advanced degree in Computer Science, Data Engineering, or a related field.
- 4–15 years of professional data engineering experience.
- Travel requirements not specified
- Clearance requirements not specified
- Collaborate with stakeholders (analytics, AI engineering, and business teams) to translate requirements into reliable, production‑ready datasets.