Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Scientist/Sr Scientist, Federated Benchmarking & Validation Engineering

Eli Lilly and Company

$151,500 - $244,200

Nov 20, 2025

Indianapolis, IN, United States of America

Lilly TuneLab is an AI-powered drug discovery platform that provides biotech companies with access to machine learning models trained on Lilly's extensive proprietary pharmaceutical research data. Through federated learning, the platform enables Lilly to build models on broad, diverse datasets from across the biotech ecosystem while preserving partner data privacy and competitive advantages. This collaborative approach accelerates drug discovery by creating continuously improving AI models that benefit both Lilly and our biotech partners.

Requirements

Experience with ML model validation, cross-validation strategies, and performance metrics
Proficiency in data engineering, pipeline development, and automation
Experience with federated learning platforms and distributed computing
Experience with clinical biomarker validation and translational research
Proficiency in workflow orchestration tools (Airflow, Kubeflow, Prefect)
Strong knowledge of containerization and cloud computing (Docker, Kubernetes)
Publications on model validation, benchmarking, or reproducibility

Responsibilities

Architect and implement privacy-preserving protocols for constructing representative test sets across distributed partner datasets, ensuring statistical validity while maintaining data isolation.
Create comprehensive benchmark suites covering small molecules (ADMET, solubility, permeability), antibodies (affinity, stability, immunogenicity), and RNA therapeutics (stability, delivery, off-target effects).
Develop validation strategies that assess model generalization across different experimental protocols, cell lines, species, and therapeutic indications while respecting partner data boundaries.
Systematically benchmark federated models against public datasets (ChEMBL, PubChem, PDB, Therapeutic Antibody Database) to establish performance baselines and identify gaps.
Implement time-split or proper scaffold-split validation protocols that assess model performance on prospective data, simulating real-world deployment scenarios and detecting concept drift.
Build robust MLOps pipelines ensuring complete reproducibility of federated experiments, including versioning of data snapshots, model checkpoints, and hyperparameter configurations.
Design statistically powered validation studies accounting for multiple testing, hierarchical data structures, and non-independent observations common in drug discovery datasets.

Other

PhD in Computational Biology, Bioinformatics, Cheminformatics, Computer Science, Statistics, or related field from an accredited college or university
Minimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development
Strong foundation in experimental design, statistical validation, and hypothesis testing
Knowledge of regulatory requirements for AI/ML in pharmaceutical development
Exceptional attention to detail and commitment to scientific rigor