The company is looking to build and maintain scalable ETL/ELT pipelines, manage databases, deploy cloud infrastructure, and automate data workflows to process large datasets efficiently and support big data processing and advanced analytics.
Requirements
- Strong Proficiency in Python for scripting, data manipulation, and orchestration.
- Strong knowledge of SQL database management and advanced SQL skills.
- Experience with Azure services (e.g., Azure Databricks, Blob Storage).
- Hands-on experience with Docker for containerization.
- Working knowledge of Databricks for big data and machine learning workflows.
- Experience with ClickHouse, OLAP Databases
- Knowledge of distributed systems and data modeling best practices.
Responsibilities
- Design, build, and maintain scalable ETL/ELT pipelines to process large datasets efficiently.
- Leverage Python for scripting and orchestration tasks.
- Develop and optimize queries and schemas in ClickHouse and SQL databases.
- Support data integration efforts combining ClickHouse, MS SQL Server and Databricks.
- Deploy and manage data workflows and applications on Azure cloud services, Docker and Python orchestration tools.
- Use Python-based orchestration tools (e.g., Apache Airflow, Dagster, or Prefect) to schedule and monitor workflows.
- Manage containerized applications for deployments and CI/CD pipelines.
Other
- Strong problem-solving skills and a deep understanding of data architecture principles.
- Ability to manage multiple priorities and work effectively in a collaborative environment.
- Excellent communication and documentation skills.
- Work is performed in an office environment.
- On occasion, the position may require an in-person site visit.