Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Eli Lilly and Company Logo

Machine Learning Scientist/Sr Scientist, Federated Benchmarking & Validation Engineering

Eli Lilly and Company

$151,500 - $244,200
Nov 20, 2025
Indianapolis, IN, United States of America
Apply Now

Lilly TuneLab is an AI-powered drug discovery platform that provides biotech companies with access to machine learning models trained on Lilly's extensive proprietary pharmaceutical research data. Through federated learning, the platform enables Lilly to build models on broad, diverse datasets from across the biotech ecosystem while preserving partner data privacy and competitive advantages. This collaborative approach accelerates drug discovery by creating continuously improving AI models that benefit both Lilly and our biotech partners.

Requirements

  • Experience with ML model validation, cross-validation strategies, and performance metrics
  • Proficiency in data engineering, pipeline development, and automation
  • Experience with federated learning platforms and distributed computing
  • Experience with clinical biomarker validation and translational research
  • Proficiency in workflow orchestration tools (Airflow, Kubeflow, Prefect)
  • Strong knowledge of containerization and cloud computing (Docker, Kubernetes)
  • Publications on model validation, benchmarking, or reproducibility

Responsibilities

  • Architect and implement privacy-preserving protocols for constructing representative test sets across distributed partner datasets, ensuring statistical validity while maintaining data isolation.
  • Create comprehensive benchmark suites covering small molecules (ADMET, solubility, permeability), antibodies (affinity, stability, immunogenicity), and RNA therapeutics (stability, delivery, off-target effects).
  • Develop validation strategies that assess model generalization across different experimental protocols, cell lines, species, and therapeutic indications while respecting partner data boundaries.
  • Systematically benchmark federated models against public datasets (ChEMBL, PubChem, PDB, Therapeutic Antibody Database) to establish performance baselines and identify gaps.
  • Implement time-split or proper scaffold-split validation protocols that assess model performance on prospective data, simulating real-world deployment scenarios and detecting concept drift.
  • Build robust MLOps pipelines ensuring complete reproducibility of federated experiments, including versioning of data snapshots, model checkpoints, and hyperparameter configurations.
  • Design statistically powered validation studies accounting for multiple testing, hierarchical data structures, and non-independent observations common in drug discovery datasets.

Other

  • PhD in Computational Biology, Bioinformatics, Cheminformatics, Computer Science, Statistics, or related field from an accredited college or university
  • Minimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development
  • Strong foundation in experimental design, statistical validation, and hypothesis testing
  • Knowledge of regulatory requirements for AI/ML in pharmaceutical development
  • Exceptional attention to detail and commitment to scientific rigor