GSK is seeking to accelerate drug discovery by leveraging massive-scale screening data and machine learning, and is looking for a Senior Principal Scientist to drive data science initiatives and develop the next generation of DNA Encoded platforms informatics.
Requirements
- PhD in computational science, bioinformatics, cheminformatics, computer science, or a closely related discipline.
- Experience in cheminformatics and DNA-encoded library (DEL) data analysis, including the application of advanced statistical and computational methods to large-scale biological datasets.
- Experience developing scientific applications using Python (such as pandas, scikit-learn, Django), SQL, and deploying solutions on modern cloud infrastructure.
- Experience leading platform development initiatives that integrate research technology, artificial intelligence, and machine learning for scalable data analysis and informatics solutions.
- Significant contributions to open-source scientific software projects or recognized achievement in computational life science competitions (e.g., Kaggle, TopCoder, DREAM Challenge).
- Expertise in the design and optimization of automated ETL pipelines for processing terabyte-scale sequencing or screening data.
- Advanced knowledge of predictive modeling, Bayesian statistics, and deep learning approaches for hit identification and structure-activity relationship prediction.
Responsibilities
- Drive data science initiatives to support informed decision-making in active early-stage small molecule and oligonucleotide discovery projects.
- Collaborate with laboratory scientists to build data infrastructure, develop decision-making heuristics, and implement tracking systems for early discovery oligonucleotide and DEL projects, supporting workflows from initial screening through candidate selection.
- Collaborate closely with research tech and AI/ML teams to architect, develop, and optimize predictive informatics platforms that enable scalable data integration, advanced statistical analytics, and actionable insights for therapeutic discovery.
- Lead the design and development of critical software infrastructure, from automated ETL pipelines that process terabyte-scale sequencing data to sophisticated web applications and interactive dashboards that enable data-driven decision-making.
- Develop and apply novel statistical methods for analyzing selection data from both small molecule and oligonucleotide libraries.
- Build robust machine learning models to predict structure-activity relationships.
- Explore deep learning approaches for hit identification.
Other
- On-site presence of 2–3 days per week, as required for team collaboration and project delivery.
- Demonstrated success in cross-functional communication, matrixed collaboration, and thought leadership within multidisciplinary teams.
- Strong analytical and problem-solving skills, with a track record of translating complex biological questions into actionable computational solutions.
- Ability to work collaboratively in cross-functional teams, communicating effectively with experts in chemistry, biology, biophysics, and data science.
- US annual base salary for new hires in this position ranges from $121,275 to $202,125