Ambry Genetics is looking for a Data Engineer II to design, build, maintain, and optimize data pipelines and systems to support analytics, machine learning, reporting, and operational workflows. This role is crucial for building and maintaining scalable, reliable, and efficient data infrastructure, including data pipelines, operational data stores, data warehouses, and data lakes.
Requirements
- 2+ years of industry experience in data engineering, analytics engineering, or software engineering
- Demonstrated experience delivering production-ready data pipelines and ETL/ELT workflows
- Previous experience working with cloud data platforms and modern data stacks
- Languages: SQL, Python, Java, Scala
- Databases: PostgreSQL, MySQL, MongoDB, DynamoDB
- Data Processing Frameworks: Apache Spark, Apache Beam, Apache Airflow, DBT, AWS Glue, Databricks, Snowflake, Dagster, etc.
- Cloud Platforms: AWS (Glue, Redshift, S3), GCP (BigQuery, Dataflow), Azure (Synapse)
Responsibilities
- Build, automate, and optimize ETL/ELT pipelines — from raw data ingestion to structured and clean datasets
- Design and implement data models for databases, data lakes, or warehouses (e.g., star/snowflake schemas, operational data stores)
- Work with cloud platforms (AWS, GCP, Azure) and tools like Redshift, BigQuery, Databricks, or Snowflake. Manage compute, storage, and networking needs related to data
- Tune SQL queries, pipeline jobs, and storage for efficiency and scalability. Improve system throughput, latency, and cost-efficiency
- Implement monitoring, alerting, and testing for data quality, completeness, and accuracy
- Work closely with data analysts, data scientists, software engineers, and product owners. Translate business requirements into technical solutions
- Document systems, pipelines, schemas, and transformations. Adhere to coding and operational best practices
Other
- Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field or equivalent work experience
- Excellent communication and interpersonal skills
- May mentor junior data engineers (Data Engineer I) on coding standards, tool usage, and best practices
- Industry experience in Life Sciences in General. Bonus if in Genomics.
- AWS certifications or Google Cloud Platform Certifications.