The company is seeking to hire a Senior Data Scientist to lead innovation in secondary and tertiary analysis of cf-RNA sequencing data, focusing on delivering rigorous and reproducible results and building predictive models for real-world clinical contexts.
Requirements
- Expertise in gene expression data analysis, including count table filtering, normalization strategies, noise quantification, differential expression analysis, and dimensionality reduction
- Strong foundation in statistical principles and rigorous application; including, but not limited to, hypothesis testing, P-value corrections, Bayesian approaches, bootstrapping, and permutation testing
- Extensive experience in building, training, testing, and validating machine learning and deep learning models, including model selection based on comparative analysis and performance metrics. Proficient in feature set development (selection, engineering, etc.) and skilled in updating and performing inference with RNA-seq-specific large language models (LLMs)
- Ability to innovate both in applying library methods and developing algorithms from scratch
- Experience with common data science infrastructure, including pipelines, clusters, databases, and feature stores. Direct experience with cloud platforms (AWS preferred) for scaling, deploying, and managing data workflows is a strong advantage
- Proficient in Python and Unix/Linux environments; additional proficiency in other languages (e.g. R, Julia, Rust) is a strong plus
- Strong coding skills across the software development lifecycle
Responsibilities
- Lead innovation in secondary and tertiary analysis of cf-RNA sequencing data, focusing on delivering rigorous and reproducible results
- Develop and implement advanced methods for differential gene expression analysis, pathway analysis, and enrichment analysis, optimizing for accuracy and biological insights
- Build, train, test, and validate predictive models, including logistic regression, random forests, and neural networks, as well as leverage existing RNA-seq large language models (LLMs) for inference and analysis
- Design and build scalable, efficient data analysis pipelines
- Engage in hypothesis-driven research, rigorously testing and validating new methods and models
- Critically evaluate results, ensuring robust models that are applicable in real-world clinical contexts beyond academic publications
- Visualize complex datasets and create compelling narratives to communicate findings to both scientific and executive audiences
Other
- PhD in a quantitative field with a strong focus on biological sciences (e.g., Applied Statistics, Biophysics, Computational Biology)
- 5+ years of biotech industry experience with a proven track record of leading successful projects
- Deep scientific curiosity and a solid grasp of the scientific method, hypothesis testing, and model validation
- Passion for building predictive and prognostic models that perform effectively in real-world applications
- Independent research capabilities, with the ability to drive projects with minimal supervision