Takeda Biotherapeutics Process Development (BPD) needs to create production cell lines and generate cell banks for drug substance production, which is used for clinical and commercial biotherapeutic products. The Cell Line Development (CLD) team is looking to leverage data science and machine learning to optimize expression constructs and predict genetic and phenotypic stability, as well as build bioinformatics workflows for detecting sequence variants and gene knockout clones, and analyzing TLA-NGS datasets to identify integration sites.
Requirements
- A strong background in data science and scientific programing (R/Python)
- In-depth knowledge of bioinformatics tools commonly used in NGS data analysis, including BWA, STAR, GATK, samtools, etc.
- Familiarity with high-performance computing environments (HPC).
- Knowledge in AI and machine learning tools is a plus.
Responsibilities
- Build a bioinformatics workflow to detect indel mutations using simulated and experimental NGS datasets.
- Optimize and test structure variant detection scripts.
- Build a pipeline of analyzing TLA-NGS datasets to identify integration sites of production cell lines.
- Support the team to explore the possibility of using machine learning tools to optimize expression constructs and predict genetic and phenotypic stability.
Other
- This position will be Hybrid (1-3 days/week in office) out of the Cambridge, MA location
- Must be pursuing a Master or Doctoral Degree in data science, biostatistics, bioinformatics, or a relevant field.
- Understanding of molecular biology and genetics is highly desired.
- Must be authorized to work in the U.S. on a permanent basis without requiring sponsorship
- Must be currently enrolled in a degree program graduating December 2026 or later