Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Advisor, Federated Learning Data Scientist

Eli Lilly

$142,500 - $228,800

Sep 2, 2025

Indianapolis, IN, US

Lilly is looking to advance its pipeline by designing critical algorithms and workflows that expedite the creation of transformative therapies through the development of large-scale, pre-trained models in a decentralized, privacy-preserving manner.

Requirements

Design and develop novel deep learning architectures (e.g., Transformer, Graph Neural Network-based) for large-scale, federated pre-training on unlabeled or partially labeled data distributed across multiple sources.
Implement and advance state-of-the-art semi-supervised and self-supervised learning algorithms (e.g., contrastive learning, masked auto-encoding) tailored for the unique constraints of federated learning, such as communication bottlenecks and data heterogeneity.
Develop and implement robust and communication-efficient federated aggregation strategies (e.g., FedAvg, FedProx, SCAFFOLD) that are stable for large, complex models and can handle non-IID (non-independently and identically distributed) data.
Create efficient and effective protocols for fine-tuning and adapting the pre-trained federated foundation models for a wide range of specific downstream tasks, ensuring knowledge transfer while maintaining privacy.
Develop high-fidelity simulation environments to test, debug, and benchmark federated pre-training strategies before real-world deployment.
Profile, analyze, and optimize the computational performance (e.g., memory, latency, communication cost) of federated training and inference to ensure scalability to a large number of clients and massive datasets.
Experience in developing statistical and machine learning models for complex endpoints.

Responsibilities

Design and develop novel deep learning architectures (e.g., Transformer, Graph Neural Network-based) for large-scale, federated pre-training on unlabeled or partially labeled data distributed across multiple sources.
Implement and advance state-of-the-art semi-supervised and self-supervised learning algorithms (e.g., contrastive learning, masked auto-encoding) tailored for the unique constraints of federated learning, such as communication bottlenecks and data heterogeneity.
Develop and implement robust and communication-efficient federated aggregation strategies (e.g., FedAvg, FedProx, SCAFFOLD) that are stable for large, complex models and can handle non-IID (non-independently and identically distributed) data.
Create efficient and effective protocols for fine-tuning and adapting the pre-trained federated foundation models for a wide range of specific downstream tasks, ensuring knowledge transfer while maintaining privacy.
Collaborate with data engineering teams to establish pipelines for accessing and simulating distributed datasets. Develop high-fidelity simulation environments to test, debug, and benchmark federated pre-training strategies before real-world deployment.
Profile, analyze, and optimize the computational performance (e.g., memory, latency, communication cost) of federated training and inference to ensure scalability to a large number of clients and massive datasets.
Author high-impact research papers for publication in top-tier machine learning conferences (e.g., NeurIPS, ICML, ICLR) and relevant scientific journals.

Other

Plays an essential leadership role, responsible for identifying, assessing, and implementing cutting-edge algorithmic solutions that leverage diverse datasets while ensuring data privacy and security for our partners.
PhD in a data science field such as Biostatistics, Statistics, Machine Learning, Computational Biology, Computational Chemistry, Physics, Applied mathematics, or related field from an accredited college or university
Minimum of 2 years of experience in the biopharmaceutical industry or related fields, with demonstrated expertise in drug discovery and early development.
Exceptional interpersonal and communication skills, with a keen ability to understand, empathize, and navigate complex relationships and dynamics
Highly self-motivated and organized.