At Lilly, the TuneLab platform aims to accelerate drug discovery by enabling biotech companies to access AI-powered machine learning models trained on Lilly's proprietary pharmaceutical research data. The Machine Learning Scientist/Sr Scientist role is crucial for quantifying prediction uncertainty in federated models and attributing value from partner contributions to model predictions, thereby building trust and ensuring fair partnership dynamics within the TuneLab ecosystem.
Requirements
- Strong theoretical foundation in probability theory, statistical inference, and uncertainty quantification
- Experience with data valuation, attribution methods, or game theory
- Understanding of federated learning constraints and privacy-preserving computation
- Experience with conformal prediction and distribution-free uncertainty quantification
- Knowledge of influence functions and data shapley methods
- Expertise in ADMET prediction and understanding of experimental uncertainty
- Publications on uncertainty quantification, data valuation, or federated learning
Responsibilities
- Design and deploy conformal prediction algorithms adapted for federated learning, providing rigorous prediction intervals and confidence sets that maintain validity despite data heterogeneity across partners and distribution shifts.
- Develop methods that use uncertainty quantification to assess data quality and value, identifying contributions that most effectively reduce model uncertainty in critical regions of chemical/biological space.
- Implement fair attribution mechanisms (Shapley values, influence functions, leave-one-out analysis) that quantify each partner's contribution to model performance while maintaining privacy and computational efficiency in federated settings.
- Create robust calibration techniques that account for varying data quality, experimental protocols, and noise levels across partners, ensuring reliable uncertainty estimates for all participants.
- Design federated aggregation schemes that weight partner contributions based on data quality, relevance, and uncertainty reduction, optimizing global model performance while maintaining fairness.
- Develop uncertainty-guided and value-aware active learning approaches that identify high-value experiments across the federation, maximizing information gain while respecting partner resources and priorities.
- Translate uncertainty estimates into risk-adjusted recommendations for drug discovery decisions, helping partners understand when to trust predictions versus conduct experiments.
Other
- PhD, or Masters in Statistics, Machine Learning, Operations Research, Computational Biology, Applied Mathematics, or related field from an accredited college or university
- Minimum of 2 years of experience in the biopharmaceutical industry or related fields
- Understanding of pharmaceutical partnerships and consortium dynamics
- Familiarity with regulatory requirements for model validation
- Strong business acumen to translate technical metrics to partnership value