LexisNexis Legal & Professional is looking to develop state-of-the-art research tools to extract key information from documents, and the company needs a Data Scientist to help build a multimodal document understanding and structured data extraction platform.
Requirements
- Solid foundation in machine learning / deep learning fundamentals, multimodal representations, and cross‑modal alignment concepts.
- Deep understanding of core principles and common algorithms for multimodal large models: cross‑modal attention & representation alignment, vision/text embedding fusion, hierarchical & layout structure modeling, instruction & contrastive paradigms, long‑context and retrieval‑augmented mechanisms, evaluation and failure mode dissection.
- Familiar with classic image and signal processing methods: edge & contour detection, filtering & denoising, morphological operations, segmentation & key point feature extraction, frequency / time‑frequency analysis, image enhancement & quality assessment;
- Knowledge of multi‑agent collaboration patterns: role assignment, task routing, feedback loops, redundancy & cross‑checks.
- Strong in statistical analysis & experimental design: hypothesis testing, factorial design, power analysis, A/B and multivariate evaluation.
- Able to decompose complex problems and build metric‑driven optimization paths.
- Rigorous in data quality & error analysis; rapid bottleneck identification.
Responsibilities
- Design and iterate the multimodal document parsing pipeline: layout / structural modeling, semantic extraction, cross‑modal alignment, structural reconstruction.
- Build and optimize a multi‑agent collaboration mechanism: task splitting, parallel / sequential scheduling, peer review, iterative quality improvement loops.
- Define model selection / composition / routing strategies (dynamic dispatch by document type, structural patterns, quality signals).
- Plan and execute model fine‑tuning, domain adaptation, continual learning, active learning, and data feedback loops.
- Establish end‑to‑end metrics: extraction accuracy, structural consistency, agent collaboration effectiveness, latency, stability, and cost.
- Build quality assurance and risk controls: drift & anomaly monitoring, confidence estimation, fallback strategies, alignment / compliance checks.
- Drive mapping and consistency between agent / model outputs and business knowledge field standards.
Other
- Education: Master’s degree or above in a quantitative or technical field (Statistics, Computer Science, Mathematics, Data Science, etc.).
- Experience: 5+ years of hands‑on machine learning / data science experience.
- U.S. National Base Pay Range: $102,800 - $171,300.
- Geographic differentials may apply in some locations to better reflect local market rates.
- This job is eligible for an annual incentive bonus.