The organization needs to develop and implement evaluation strategies for AI systems supporting device intelligence within the life sciences, diagnostics, and biotechnology sectors.
Requirements
- Experience designing and implementing evaluation methodologies for AI systems, including LLMs and computer vision.
- Knowledge of metrics for AI performance, robustness, and fairness, especially in regulated domains.
- Expertise in at least three of the following: benchmarking frameworks, statistical validation, synthetic data generation, adversarial testing, explainability techniques.
- Proficiency in Python and ML libraries (such as PyTorch, TensorFlow) and familiarity with evaluation tools (such as OpenAI Evals, Dynabench, Promptfoo).
- Experience with regulatory processes for medical devices and AI/ML-based software as a medical device (SaMD) is a plus.
- Familiarity with quality management systems and standards relevant to life sciences and diagnostics is a plus.
- Knowledge of instrument control mechanisms and integration with AI systems is a plus.
Responsibilities
- Define and execute evaluation strategies for AI products in life sciences, diagnostics, and biotechnology.
- Design and implement evaluation frameworks for agentic workflows, LLMs, NLP, computer vision, and multimodal models.
- Develop and conduct evaluation plans to assess performance, reliability, and safety across multimodal datasets.
- Analyze evaluation results, identify weaknesses, and recommend improvements for AI models and workflows.
- Build automated pipelines for continuous evaluation and monitoring of AI systems in production.
Other
- Collaborate with senior leaders and product teams to align evaluation criteria with KPIs and regulatory needs.
- Ability to communicate complex evaluation results to technical and non-technical stakeholders.
- Eligible for remote work arrangements.
- This remote position is available in Europe or the Eastern US.