Takeda Biotherapeutics Process Development (BPD) needs to create production cell lines and generate cell banks for biotherapeutic products. The Cell Line Development (CLD) team requires support in analyzing next-generation sequencing (NGS) data, optimizing detection scripts, and exploring machine learning applications to improve expression constructs and predict genetic stability.
Requirements
- A strong background in data science and scientific programing (R/Python)
 
- In-depth knowledge of bioinformatics tools commonly used in NGS data analysis, including BWA, STAR, GATK, samtools, etc.
 
- Familiarity with high-performance computing environments (HPC).
 
- Understanding of molecular biology and genetics is highly desired.
 
- Knowledge in AI and machine learning tools is a plus.
 
Responsibilities
- Build a bioinformatics workflow to detect indel mutations using simulated and experimental NGS datasets.
 
- Optimize and test structure variant detection scripts.
 
- Build a pipeline of analyzing TLA-NGS datasets to identify integration sites of production cell lines.
 
- Support the team to explore the possibility of using machine learning tools to optimize expression constructs and predict genetic and phenotypic stability.
 
Other
- This position will be Hybrid (1-3 days/week in office) out of the Cambridge, MA location
 
- Must be pursuing a Master or Doctoral Degree in data science, biostatistics, bioinformatics, or a relevant field.
 
- Must be authorized to work in the U.S. on a permanent basis without requiring sponsorship
 
- Must be currently enrolled in a degree program graduating December 2026 or later
 
- The internship program is 10-12 weeks depending on the two start dates (May 26th-August 14th or June 15th- August 21st)