The company is looking to build big data collection and analytics capabilities to uncover customer, product and operational insights.
Requirements
- Knowledge of or direct experience with Databricks and/or Spark
- Software development experience, ideally in Python, PySpark, Kafka or Go, and a willingness to learn new software development languages to meet goals and objectives
- Knowledge of strategies for processing large amounts of structured and unstructured data, including integrating data from multiple sources
- Knowledge of data cleaning, wrangling, visualization and reporting
- Familiarity of databases, BI applications, data quality and performance tuning
- Knowledge of or direct experience with the following AWS Services desired S3, RDS, Redshift, DynamoDB, EMR, Glue, and Lambda
- Practical Knowledge of Linux software
Responsibilities
- Design and implement data pipelines to be processed and visualized across a variety of projects and initiatives
- Develop and maintain optimal data pipeline architecture by designing and implementing data ingestion solutions on AWS using AWS native services
- Design and optimize data models on AWS Cloud using Databricks and AWS data stores such as Redshift, RDS, S3
- Integrate and assemble large, complex data sets that meet a broad range of business requirements
- Read, extract, transform, stage and load data to selected tools and frameworks as required and requested
- Customizing and managing integration tools, databases, warehouses, and analytical systems
- Process unstructured data into a form suitable for analysis and assist in analysis of the processed data
Other
- Ability to work effectively within a team in a fast-paced changing environment
- Excellent written, verbal and listening communication skills
- Comfortable working asynchronously with a distributed team
- Bachelors Degree in Computer Science, Information Technology or other relevant field
- At least 3 to 8 years of recent experience in Software Engineering, Data Engineering or Big Data