Veracyte aims to improve diagnostic accuracy and transform cancer care by leveraging data to derive actionable insights and develop predictive models.
Requirements
Proficiency in Python, R, or similar languages for data analysis and modeling.
Experience with AWS services (e.g., SageMaker, Redshift) and Snowflake for data processing and storage.
Familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch) and data visualization tools (e.g., Matplotlib, Tableau).
Knowledge of SQL and data cataloging concepts is advantageous.
Experience with healthcare or genomic data is a plus.
Responsibilities
Analyze large, complex datasets from the Veracyte Lakehouse (e.g., genomic, clinical, operational data) to identify trends and patterns.
Develop and deploy machine learning models using tools like Amazon SageMaker and Python for applications such as biomarker discovery and clinical decision support.
Build, validate, and refine AI/ML models (e.g., LLM refinement, Verachat RAG) to support AI training and operational dashboards.
Optimize models for performance and scalability within cloud environments like AWS and Snowflake.
Translate data insights into actionable recommendations for business operations, R&D, and healthcare providers.
Create visualizations and reports to communicate findings to technical and non-technical audiences.
Contribute to the development of data-driven strategies by providing analytical expertise.
Other
Collaborates closely with data engineers, the Technical Program Manager (TPM), and cross-functional teams in a Scrum environment.
Work with the TPM and stakeholders to define data science requirements and user stories for inclusion in the team’s Jira backlog.
Partner with data engineers to ensure data pipelines and cataloged datasets meet analytical needs.
Strong analytical and problem-solving skills.
Excellent communication skills to collaborate with cross-functional teams in a Scrum setting.
Ability to work effectively in a fast-paced, innovative environment.
Mentor junior team members and promote a culture of continuous learning in data science practices.