iManage is looking to solve the problem of transforming unstructured text into meaningful insights that power AI and machine learning solutions, by building large-scale text data pipelines that fuel generative AI applications, agentic systems, advanced model fine tuning and other NLP-driven capabilities across the iManage platform.
Requirements
- Strong proficiency in Python, PySpark, and data manipulation for large unstructured text datasets
- Strong understanding of NLP concepts such as tokenization, embeddings, semantic search, and experience with standard text libraries such as SpaCy, HuggingFace Datasets, NLTK
- Solid dataOps knowledge and experience orchestrating advanced NLP data pipelines using cloud based data infrastructure
- Proficiency with Git and collaborative development frameworks
- Exposure to Microsoft Azure Services such as Fabric, ADLS, AI Foundry, Azure ML, MLflow
- Experience with knowledge graph implementation for NLP applications
- Experience working with data for the legal domain
Responsibilities
- Designing, developing and maintaining scalable pipelines in MSFT Azure to ingest and transform large volumes of text data from multiple sources
- Designing automated workflows for text normalization, deduplication, language identification, PII redaction and metadata enrichment
- Building automated data validation processes to ensure accuracy and consistency
- Supporting model fine-tuning, semantic search and Gen AI evaluations tuning through dataset curation, prompt dataset preparation, labeling coordination, and text quality validation
- Partnering with the Applied AI team to gather data requirements and build data interfaces for developing and maintaining machine learning systems
- Maintaining data lineage and following data privacy, security and governance best practices
- Implementing data versioning and lineage tracking for machine learning experiments
Other
- A Bachelor’s degree or higher in Computer Science, Data Engineering, Applied Mathematics, Computational Linguistics, or a quantitative related field
- 4+ years of data engineering experience, with at least 2 years working with unstructured data in a business setting
- Problem solving, creativity, curiosity, and a collaborative mindset
- Ability to work in-office on Tuesdays & Thursdays to collaborate, connect, and learn from peers
- Flexible work hours that allow for meaningful work-life balance