Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Formation Bio Logo

Senior Data Engineer II - Data Curation

Formation Bio

$220,000 - $280,000
Sep 29, 2025
New York, NY, US • Boston, MA, US • San Francisco, CA, US • Raleigh, NC, US
Apply Now

Formation Bio is seeking a technical leader to solve the problem of modeling, harmonizing, and unifying complex biomedical and healthcare data into high-quality, stakeholder-ready assets to accelerate drug development and clinical trials.

Requirements

  • Proven expertise in SQL/dbt modeling and integrating healthcare and biomedical ontologies.
  • Hands-on experience with ontology-driven harmonization and data model integration across heterogeneous datasets.
  • Strong background in data architecture and stack design, with the ability to define standards and paved paths.
  • Experience working with unstructured data: entity extraction (NER), NLP, embeddings, or document parsing.
  • Familiarity with vector databases, semantic search, and knowledge graph concepts — and how to connect these with structured datasets for unified consumption.
  • Experience with knowledge graph technologies (e.g., Neo4j, RDF/SPARQL, Cypher).
  • Experience with healthcare and life sciences ontologies such as Mondo, OMOP, FHIR, SNOMED, RxNorm, UMLS.

Responsibilities

  • Define and communicate technical direction for the Data Curation team.
  • Drive the architecture and technical stack for ontology-driven harmonization across healthcare and pharmaceutical datasets.
  • Lead development of robust SQL/dbt models that unify complex healthcare and pharma datasets.
  • Apply healthcare and biomedical ontologies (e.g., SNOMED, RxNorm, UMLS, Mondo, OMOP, FHIR) to ensure interoperability and consistent integration.
  • Design scalable workflows for ontology alignment, normalization, and harmonized data product creation.
  • Lead integration of unstructured data sources (clinical notes, publications, documents, scientific text) using NER, NLP, embeddings, and document parsing.
  • Establish architectural patterns for managing ontology mappings, ontology-driven transformations, and harmonized knowledge assets.

Other

  • 7+ years of experience in data engineering, semantic modeling, or data curation, with leadership experience in technical direction.
  • Comfortable with Python, orchestration tools (Dagster, Airflow), and working with diverse data types.
  • Skilled at collaborating with infrastructure teams to balance semantic integration with scalable foundational tooling.
  • Excited to mentor others, set high standards, and drive alignment across a multidisciplinary team.
  • Please only apply if you reside in these locations or are willing to relocate.