Bristol Myers Squibb needs to discover biomarkers that guide patient selection and treatment response for BMS assets by enabling exploratory data analysis that drives crucial biomarker decisions at the heart of translational research.
Requirements
- Strong experience with data integration from heterogeneous sources (structured, semi-structured, unstructured).
- Proficiency in AWS, Python and SQL, with ability to prototype and automate workflows.
- Hands-on expertise with ETL frameworks (AWS Glue, Databricks, Airflow)
- Familiarity with modern AI/LLM approaches for data transformation and semantic mapping is highly desirable.
- orchestrating advanced pipelines
- ensuring auto-generated ETL and schema mappings are correct
- experimenting with the newest techniques—such as MCP servers, prompt engineering strategies (ReACT, chain-of-thought, etc.), and LLM-assisted tooling
Responsibilities
- Enable biomarker discovery: Deliver data pipelines and mappings that help translational leaders identify biomarkers (molecular, digital, imaging) for patient stratification and treatment response.
- Innovate with AI/LLMs: Explore and apply cutting-edge approaches (MCP servers, prompt orchestration, auto-schema mapping, LLM-based ETL generation) to accelerate and improve data workflows.
- Data orchestration: Oversee ingestion from diverse sources (vendor feeds, raw instruments, CSV, PDF, etc.), ensuring automated ETL and sample-to-target mapping & transformation (STTM) outputs meet stakeholder needs.
- Quality and profiling: Assess and validate source data, documenting any cleaning, normalization of semantic mapping that needs to be applied for optimal QC, and identify where improvements are required vs merely convenient.
- Hands-on implementation: Build or adapt tools/scripts (Python, SQL, AWS Glue, Databricks, etc.) when automation falls short.
- Stakeholder collaboration: Act as a partner to translational medicine leaders—communicating progress, and brainstorming next steps as priorities evolve.
- Agile team contribution: Participate actively in standups, design sessions, sprint demos and innovation discussions.
Other
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Bioinformatics, or related field.
- 5+ years of experience in data engineering, ideally with exposure to life sciences or healthcare.
- Excellent communication skills to engage both technical and scientific stakeholders.
- Comfortable in agile, exploratory, scientific environments
- The occupancy type that you are assigned is determined by the nature and responsibilities of your role: Site-essential roles require 100% of shifts onsite at your assigned facility. Site-by-design roles may be eligible for a hybrid work model with at least 50% onsite at your assigned facility.