Formation Bio is looking to build the semantic layer that makes diverse data pillars interoperable, consistent, and actionable to accelerate drug development and clinical trials.
Requirements
- Strong SQL and data modeling skills, with proven experience designing semantic or analytical layers.
- Experience working with both structured data (e.g., relational tables, APIs) and unstructured data (e.g., documents, free text, biomedical literature, healthcare notes).
- Familiarity with healthcare/life sciences ontologies (SNOMED CT, ICD, RxNorm, LOINC, HL7 FHIR, OMOP, Mondo) and/or financial/commercial taxonomies.
- Hands-on experience with Snowflake, dbt, Dagster, and modern data stacks.
- Experience with unstructured data workflows (NLP, embeddings, semantic search, knowledge graphs).
- Practical use of metadata management and data catalog platforms.
- Hands-on experience structuring dbt projects with testing, quality checks, and reusable design patterns.
Responsibilities
- Build and maintain SQL/dbt models that unify datasets across healthcare, commercial/pharma, biomedical, and finance domains, leveraging ontologies (e.g., SNOMED CT, ICD, RxNorm, HL7 FHIR, OMOP).
- Design models that handle not only structured datasets but also unstructured data sources (e.g., documents, free text, biomedical literature), preparing them for AI-driven applications.
- Own and evolve the semantic layer that transforms raw data into consistent, reusable models powering analytics and advanced AI.
- Contribute to pipelines that bring in data from APIs, partner feeds, flat files, and unstructured text, ensuring inputs are reliable, well-documented, and metadata-rich.
- Apply FAIR principles to ensure data is traceable, interoperable, and reusable across structured and unstructured domains.
- Partner with commercial, scientific, finance, and healthcare stakeholders to align semantic models with real-world use cases.
- Document data standards and reusable modeling patterns to empower downstream teams and reduce cognitive load.
Other
- 5+ years of experience as a Data Engineer, Analytics Engineer, or similar role in healthcare, pharma, biotech, finance, or other highly regulated industries.
- Deep expertise in at least one data domain (e.g., healthcare/EHR/claims, commercial/pharma, biomedical/scientific, or finance), with a track record of translating complex, domain-specific datasets into consistent and usable models.
- Exposure to additional domains beyond your core area of expertise, and the ability to learn and adapt to new datasets quickly.
- Understanding of regulatory and compliance considerations in healthcare, pharma, or finance.
- Please only apply if you reside in these locations or are willing to relocate.