Enable Data Incorporated is looking to design and implement resilient data architectures using AWS technologies to enable data-driven decision-making.
Requirements
- Proven, hands-on experience in data engineering or a related software engineering role, with a focus on big data technologies.
- Strong proficiency in Python and expert knowledge of PySpark, including Spark SQL and data manipulation techniques.
- Solid understanding and practical experience with AWS cloud services related to data engineering (e.g., S3, EMR, SageMaker).
- Proficiency in SQL and experience working with various database systems (relational and/or NoSQL).
- Deep understanding of distributed computing concepts, data modeling, data lake design principles, and big data frameworks.
- Familiarity with orchestration tools such as Apache Airflow, and version control systems (Git).
Responsibilities
- Design, develop, and implement efficient and reliable data pipelines and ETL processes using PySpark for large-scale data processing in a distributed environment.
- Extract, cleanse, and transform raw data into a format optimal for ML models, creating new features that enhance model accuracy and performance.
- Leverage a variety of AWS services, such as Amazon S3 (for data storage), AWS Glue (cataloging), Amazon EMR (for running Spark clusters), and Amazon SageMaker (for ML integration and feature stores) to build and deploy solutions.
- Optimize existing PySpark applications and data pipelines for performance, cost-efficiency, and scalability.
- Work in an Agile team environment, collaborating with data scientists, data architects, and software engineers to understand data requirements and deliver integrated data solutions.
- Write clean, maintainable, and well-documented production-level code, participating in code reviews and implementing CI/CD practices where appropriate.
Other
- Work in an Agile team environment, collaborating with data scientists, data architects, and software engineers to understand data requirements and deliver integrated data solutions.
- Strong analytical and problem-solving abilities, with a focus on detail and accuracy.
- Excellent communication and teamwork skills to collaborate effectively with cross-functional teams.