Vertex Pharmaceuticals is seeking to strengthen and further breakthroughs in AI, driving new discoveries and insights for Global Research across domains including chemistry, biology, and imaging by ensuring data is stored in an appropriately structured form, with the relevant metadata that conforms with FAIR principles.
Requirements
- Experience in working with scientific data sources such as ChEMBL, UniProt, OpenTargets, HPA, GnomAD, SingleCellPortal or GTEx, including integration and harmonization of data across databases
- Experience on FAIR data practices including the use of common ontologies such as Allotrope Foundation and the BioAssay Ontology
- Knowledge of data governance principles and frameworks, and tools such as Colibra or Unity Catalog
- Very strong programming skills, ideally in Python and SQL, as well as familiarity with distributed data processing languages such as Apache Spark
- Experience with data platforms such as Databricks, Snowflake
- Familiarity with database architectures that are oriented towards large scale scientific data such as TileDB, VoltDB
- A strong understanding of emerging technologies such as cloud architectures and AI and ML approaches to data management tasks
Responsibilities
- Work with colleagues across DCS to understand their data and how it is used, and develop a data management & governance roadmap that will address data management needs in a prioritized manner
- Define and implement data management solutions, in collaboration with DTE, for large-scale results generated from computational workflows spanning chemistry, biology, imaging, and screening
- Where relevant, be responsible for prototyping pipelines to create integrated datasets that combine internal and/or external data sources, and work with colleagues in DTE to productionize such datasets for broader use within DCS
- Contribute to DCS-specific development of best practices, guidelines, and SOPs, as appropriate with a focus on data related aspects
- Align with enterprise-level data governance frameworks
- Coordinate with data engineering efforts in DTE to integrate samples, tests and results across the Research environment
- Support other prioritized data needs as needed, such as evaluation of technology solutions for insights from scientific literature, and identification of key external data sources to address gaps in internal data
Other
- Experience working cross-functionally and collaborating across a team to drive alignment
- Experience championing for data governance principles
- Excellent oral and written communication skills.
- A team-oriented growth mindset that welcomes feedback from others and supports other team members; a positive attitude that enthusiastically tackles and overcomes challenges
- A PhD (or equivalent) in computational sciences, with 7+ years of relevant experience, or a Masters degree in the computational sciences, with 9 or more years of relevant experience in research data management