The Department of Medicine, Division of Nephrology Quantitative Health is seeking a Data Scientist III to support a federally funded, interdisciplinary research initiative. The project aims to unify clinical, imaging, and molecular data to develop predictive models of disease progression.
Requirements
- Experience with NLP in the clinical domain using libraries like MedSpaCy or cTAKES
- Knowledge of EHR data structures, standards, and interoperability frameworks (e.g., OMOP, FHIR)
- Familiarity with Python and clinical data integration tools
- Strong organizational skills and attention to reproducibility and versioning
- Experience collaborating with clinical, data science, or research stakeholders
- Additional technical certifications (e.g., AWS, Security+, etc.) may be encouraged but not required.
Responsibilities
- Build and maintain pipelines using tools such as MedSpaCy, cTAKES, or similar to extract structured variables from clinical notes.
- Tune entity recognition, concept mapping, and negation detection to support patient-level feature generation.
- Document pipeline logic and validation metrics.
- Develop tools to extract, clean, and organize structured EHR variables (e.g., labs, medications, diagnoses).
- Apply clinical standards (e.g., OMOP, FHIR) to support semantic consistency and cross-site interoperability.
- Transform EHR data into research-ready formats aligned with modeling needs.
- Align clinical events with imaging and biopsy timelines to enable time-resolved analysis.
Other
- A Bachelor's Degree in data science, statistics, bioinformatics, analytics, or similar field and five years of experience; Master's Degree in data science, statistics, bioinformatics, analytics, or similar field and three years of experience; Doctoral Degree in data science, statistics, bioinformatics, analytics, or similar field and one year of experience.
- Communicate updates and collaborate across project teams and external sites.
- Provide informal guidance to student researchers or junior analysts.
- Recommend new tools or analytic methods to improve pipeline performance.
- Hiring is contingent on eligibility to work in the U.S.