Starburst is seeking a Data Engineer to enhance internal usage, telemetry, and product data analytics for Starburst Enterprise and Galaxy, aiming to better understand customers and product usage, and to provide feedback on product performance and user experience.
Requirements
- Experience building and optimizing data pipelines using Trino, Spark, dbt, and related frameworks.
- Experience managing data infrastructure in public clouds.
- Experience using and managing orchestration frameworks such as Apache Airflow or Dagster.
- Knowledge of RAG and other design patterns for AI applications
- Fluency in SQL
- Experience building API integrations for extracting data from third party sources.
- Excellent coding ability in Java, Python or Scala.
Responsibilities
- Build and manage a high quality data lake to support various aspects of Startburst’s business, including product management, finance, customer support, and engineering.
- Find innovative ways to use Trino and Starburst to solve data management challenges
- Collaborate with technical leads, product managers and data analysts to build robust data products and analytics
- Leverage AI to democratize access to datasets for users throughout Starburst.
- Enable dataset preparation and model evaluation for Starbursts’ AI projects
- Define and adapt data engineering processes and best practices to focus on execution and getting reliable answers to important business questions
- Work closely with leaders from other teams and departments to iterate on both data architecture and design of data solutions, focusing on high-quality results accessible at several levels
Other
- At least 7 years of data engineering experience, and a clear passion for data and analytics
- Enthusiasm for working both independently and collaboratively with strong, diverse, high-performing teams to get value and insights from data
- Knowledge of data modelling techniques which are appropriate for modern data lakes
- Experience with a variety of AWS services such as EMR, EC2, S3, and IAM. Multi-cloud experience (GCP/Azure) is also nice to have.
- Able to use Configuration-as-Code and Infrastructure-as-Code tools such as Pulumi, Terraform, and/or Ansible.
- Demonstrable experience in delivering value and hitting deadlines consistently
- Has disciplined software engineering practices, including high code quality, extensive automated testing, and rigorous code review
- Highly proficient in both written and verbal communication, coupled with strong organizational abilities
- Remote, US