Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Data Engineer, Data Curation

Formation Bio

$180,000 - $230,000

Sep 10, 2025

New York, NY, US • Boston, MA, US • San Francisco, CA, US • Raleigh, NC, US

Formation Bio is addressing the inefficiency in drug development caused by the high cost and time of clinical trials. The company aims to accelerate drug development using AI and technology platforms to bring new medicines to patients more efficiently.

Requirements

5+ years of experience as a Data Engineer, Analytics Engineer, or similar role in healthcare, pharma, biotech, finance, or other highly regulated industries.
Deep expertise in at least one data domain (e.g., healthcare/EHR/claims, commercial/pharma, biomedical/scientific, or finance), with a track record of translating complex, domain-specific datasets into consistent and usable models.
Strong SQL and data modeling skills, with proven experience designing semantic or analytical layers.
Exposure to additional domains beyond your core area of expertise, and the ability to learn and adapt to new datasets quickly.
Experience working with both structured data (e.g., relational tables, APIs) and unstructured data (e.g., documents, free text, biomedical literature, healthcare notes).
Familiarity with healthcare/life sciences ontologies (SNOMED CT, ICD, RxNorm, LOINC, HL7 FHIR, OMOP, Mondo) and/or financial/commercial taxonomies.
Hands-on experience with Snowflake, dbt, Dagster, and modern data stacks.

Responsibilities

Build and maintain SQL/dbt models that unify datasets across healthcare, commercial/pharma, biomedical, and finance domains, leveraging ontologies (e.g., SNOMED CT, ICD, RxNorm, HL7 FHIR, OMOP).
Design models that handle not only structured datasets but also unstructured data sources (e.g., documents, free text, biomedical literature), preparing them for AI-driven applications.
Own and evolve the semantic layer that transforms raw data into consistent, reusable models powering analytics and advanced AI.
Contribute to pipelines that bring in data from APIs, partner feeds, flat files, and unstructured text, ensuring inputs are reliable, well-documented, and metadata-rich.
Apply FAIR principles to ensure data is traceable, interoperable, and reusable across structured and unstructured domains.
Partner with commercial, scientific, finance, and healthcare stakeholders to align semantic models with real-world use cases.
Document data standards and reusable modeling patterns to empower downstream teams and reduce cognitive load.

Other

Reside in or be willing to relocate to key hubs: New York City, Boston metro areas, Research Triangle (NC), or San Francisco Bay Area.
Collaborate across pillars, enabling others while owning core responsibilities.
Leverage deep domain expertise while learning quickly in unfamiliar data areas.
Strive to reduce complexity for downstream users by standardizing and documenting.
Build today’s models with tomorrow’s AI-native and data-driven applications in mind.