Design, develop, and maintain scalable and reliable data pipelines and back-end systems, working with large datasets, leveraging cloud technologies, and collaborating with data scientists and other engineers to deliver impactful data-driven solutions.
Requirements
- Strong proficiency in Spark, Python, Scala, and Java.
- Expertise in SQL and working with relational databases.
- Experience with DataFrames for data manipulation and analysis.
- Experience with cloud technologies (preferably AWS or GCP).
- Experience with message streaming platforms like MSK/Kafka.
- Experience with S3 or similar object storage.
- Experience with data lake technologies like Iceberg.
Responsibilities
- Design, develop, and maintain data pipelines using Spark, Python, Scala, and Java.
- Write efficient and optimized SQL queries for data extraction, transformation, and loading (ETL) processes.
- Work with DataFrames to manipulate and analyze large datasets.
- Implement data storage and processing solutions using cloud technologies (preferably AWS or GCP).
- Build and maintain real-time data streaming pipelines using MSK/Kafka.
- Utilize S3 for data storage and retrieval.
- Work with data lake technologies like Iceberg.
Other
- 5+ years of experience in back-end development with a focus on data engineering.
- Collaborate with data scientists and other engineers to understand data requirements and deliver solutions.
- Participate in code reviews and contribute to improving our development processes.
- Troubleshoot and resolve issues in data pipelines and back-end systems.
- Excellent communication and collaboration skills.