Design, develop, and maintain scalable data pipelines and cloud-based solutions to support enterprise analytics and reporting.
Requirements
- Python (3.x) – scripting, API development, automation.
- Spark/PySpark, Hadoop ecosystem.
- Kafka.
- Oracle, Teradata, or SQL Server.
- Azure or GCP (BigQuery, Dataflow).
- Kubernetes/OpenShift.
- GitHub, Jenkins.
Responsibilities
- Build and maintain ETL pipelines using Python and PySpark for batch and streaming data.
- Develop data ingestion frameworks for structured/unstructured sources.
- Implement data workflows using Airflow and integrate with Kafka for real-time processing.
- Deploy solutions on Azure or GCP using container platforms (Kubernetes/OpenShift).
- Optimize SQL queries and ensure data quality and governance.
- Collaborate with data architects and analysts to deliver reliable data solutions.
Other
- US Citizens only
- Bachelor’s in Computer Science or related field.
- 3–5 years of experience in data engineering and Python development.
- Financial services experience.
- Flexible work from home options available.