Truveta is looking to advance the use of large language models (LLMs) in healthcare data analysis by evaluating their performance in extracting data from unstructured clinical documentation, validating the accuracy and consistency of that data, and developing tools for customers to use this extracted data in their scientific studies.
Requirements
- Strong programming skills in R and SQL; experience with Python, PySpark, or Databricks a plus.
- Proven expertise in working with structured health data using medical ontologies such as ICD-10, CPT, NDC, LOINC, SNOMED CT, and RxNorm.
- Experience with statistical methods to evaluate: Performance metrics such as precision, recall, F1 score, Agreement and Reliability, Error and Bias, Robustness
- Experience supporting regulatory-grade Real World Evidence (RWE), Health Outcomes, or payer-facing studies.
- Exposure to clinical phenotyping, NLP, or unstructured data integration.
- Ability to translate statistical output into clinically meaningful insight.
- 7+ years of applied statistical experience with EHR and/or claims data (e.g., Optum, MarketScan, Flatiron, Epic, Oracle Health (formerly Cerner), CMS).
Responsibilities
- Lead the design and execution of statistical evaluation strategies for Truveta’s AI and clinical models, ensuring rigor and reproducibility.
- Develop and apply advanced statistical methods to evaluate model performance, calibration, generalizability, and bias, especially in the context of real-world data (RWD).
- Guide the use of observational study designs and causal inference techniques to strengthen model evaluation using RWD.
- Partner closely with clinical, data science, and product teams to define evaluation criteria that align with clinical relevance, patient outcomes, and regulatory standards.
- Provide expertise on data quality, missingness, confounding, and heterogeneity in RWD to ensure robust evidence generation.
- Communicate statistical insights and evaluation results effectively to technical teams, clinicians, external stakeholders, and research collaborators, influencing model improvement, validation, and adoption.
- Stay current with evolving best practices in biostatistics, machine learning evaluation, and RWD methodologies, and mentor teams on their application.
Other
- Master’s or Ph.D. in Biostatistics, Mathematics, Epidemiology, Health Economics, Data Science, or related field.
- Strong communication skills (written and verbal)
- Proactive communicator, comfortable working across teams and customers.
- Strong documentation habits and commitment to reproducibility.
- All applicants must be authorized to work in the United States for any employer as we are unable to sponsor work visas or permits (e.g. F-1 OPT, H1-B) at this time.