The company is looking for a Data Engineer to design and build the data infrastructure and pipelines that power their AI/ML capabilities, ensuring data scientists and ML engineers have clean, reliable, and scalable data for model development and deployment.
Requirements
- Strong programming skills in Python, SQL, and familiarity with Java/Scala a plus.
- Hands-on experience with big data frameworks (e.g., Spark, Flink, Hadoop) and workflow orchestration (Airflow, Prefect, Dagster).
- Proven experience with cloud-based data platforms (AWS, GCP, Azure) and data lake/warehouse technologies (Snowflake, BigQuery, Redshift, Delta Lake).
- Strong understanding of data modeling, ETL/ELT processes, and distributed data systems.
- Experience with streaming data systems (Kafka, Kinesis, Pub/Sub) preferred.
- Knowledge of data governance, security, and compliance best practices.
- Strong analytical and problem-solving skills, with a focus on building maintainable, scalable systems.
Responsibilities
- Design, build, and maintain scalable ETL/ELT pipelines for structured and unstructured data.
- Develop data architectures that support large-scale training, inference, and analytics workflows.
- Ensure data quality, governance, and lineage across multiple sources and systems.
- Partner with data scientists and ML engineers to deliver high-quality datasets for model development.
- Optimize data workflows for performance, scalability, and reliability on cloud platforms (AWS, GCP, Azure).
- Leverage modern data engineering tools (e.g., Spark, Databricks, Airflow, Kafka, dbt) to support pipelines and workflows.
- Implement monitoring, alerting, and observability for data pipelines to ensure robustness.
Other
- 5+ years of experience as a Data Engineer or in a similar role focused on large-scale data systems.
- Excellent collaboration skills and ability to work across engineering, product, and AI teams