At PitchBook, a Morningstar company, the business problem is to innovate, evolve, and invest in data extraction, enrichment, and validation processes to bring out the best in everyone, by building intelligent systems that scale and improve PitchBook’s proprietary datasets.
Requirements
- Proven success delivering AI-driven data extraction, enrichment, or document understanding systems at scale
- Hands-on experience with parameter-efficient fine-tuning methods and expertise in document classification optimization preferred
- Deep expertise in natural language processing, document AI, OCR, entity resolution, and large-scale data automation
- Strong understanding of modern ML frameworks and infrastructure (e.g., PyTorch, TensorFlow, Hugging Face, LangChain, MLFlow)
- Strong knowledge of cloud-native architecture, distributed computing, and scalable model deployment
- Experience in fintech, data platforms, or large-scale information extraction systems preferred
- Contributions to the AI/ML research community (e.g., publications, patents, or open-source projects) are strongly preferred
Responsibilities
- Serve as the key technical leader shaping system design, ML architectures, model lifecycles, and scalable infrastructure for data extraction, document understanding, and structured data enrichment
- Architect reusable frameworks and services for LLM-powered extraction, entity recognition and resolution models, and multimodal document processing
- Design and build state-of-the-art ML models using transformers, LLMs, generative models, graph-based approaches, and OCR/Document AI frameworks
- Identify opportunities to advance automation and accuracy across our ingestion stack, including entity linking, relationship inference, classification, and anomaly detection
- Translate emerging research into practical, production-ready capabilities
- Contribute to PitchBook’s growing technical reputation through experimentation, publication, or open-source work
- Own the lifecycle of mission-critical ML systems from data preparation to deployment, monitoring, and continuous improvement
Other
- Bachelor’s or Master’s degree in Computer Science, Mathematics, Data Science, or a related technical discipline (Master’s degree preferred)
- 8+ years of experience in machine learning, data science, or AI-focused engineering, with at least 4+ years of experience leading technical teams
- Excellent communication, collaboration, and influencing skills including experience presenting to executive and cross-functional leadership
- A track record of fostering technical excellence and innovation across global, multidisciplinary teams
- Ability to work in the office 5 days a week