Thomson Reuters is seeking to enhance its legal AI platform by building state-of-the-art document understanding systems that serve as the cognitive foundation for products like Westlaw, PracticalLaw, and CoCounsel. These systems aim to improve how legal professionals research, analyze, and reason over complex legal documents and to advance the next generation of intelligent legal AI agents.
Requirements
- PhD in Computer Science, AI, NLP, or a related field, or a Master's degree with equivalent research/industry experience
- 7+ years of hands-on experience building and deploying document understanding systems, information extraction pipelines, or knowledge graph construction using deep learning, LLMs, and NLP methods
- Strong programming skills (e.g., Python) and experience with modern deep learning frameworks (e.g., PyTorch, Hugging Face Transformers, DeepSpeed)
- Publications at relevant venues such as ACL, EMNLP, ICLR, NeurIPS, SIGIR, or KDD
- Deep understanding of document understanding fundamentals: document layout analysis, semantic chunking approaches beyond fixed-size or paragraph-based methods, document classification handling hierarchical taxonomies, imbalanced multi-label classification, and adapting to domain-specific schemas
- Expertise in knowledge extraction and knowledge graph construction: entity recognition and linking, relation extraction, citation parsing, and building graph representations from unstructured text
- Expertise in LLM-based information extraction, few-shot and multi-task learning, post-training, and knowledge distillation
Responsibilities
- Lead the design, build, test, and deployment of end-to-end AI solutions for complex document understanding tasks in the legal domain
- Direct the execution of large-scale projects including: advanced semantic chunking models for lengthy, non-uniformly structured legal documents with adjustable granularity; document enrichment systems with legal and customer-defined taxonomies; LLM-based knowledge graph construction pipelines that extract and link heterogeneous legal knowledge; and scalable synthetic data generation systems
- Serve as the technical lead and primary point of reference, ensuring full accountability for all research deliverables
- Partner with engineering to guarantee well-managed software delivery and reliability at scale across multiple product lines
- Design comprehensive evaluation strategies for both component-level and end-to-end quality, leveraging expert annotation and synthetic data
- Apply robust training methodologies that balance performance with latency requirements
- Lead knowledge distillation initiatives to compress large models into production-ready SLMs
Other
- PhD in Computer Science, AI, NLP, or a related field, or a Master's degree with equivalent research/industry experience
- Demonstrated ability to provide technical leadership, mentor team members, and influence without formal authority in an applied research setting
- Flexibility & Work-Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities
- Career Development and Growth: By fostering a culture of continuous learning and skill development
- Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation, two company-wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and resources for mental, physical, and financial wellbeing