At PitchBook, a Morningstar company, the business problem is to accelerate and scale data coverage by applying advanced ML models to identify, extract, and validate entities, relationships, and key insights from vast collections of structured and unstructured sources.
Requirements
- Deep expertise in natural language processing, document AI, OCR, entity resolution, large-scale data automation, optimizing large document workflows, and addressing latency in retrieval-based architectures
- Familiarity with agentic AI frameworks (MCP, A2A) and orchestration of multi-agent systems
- Strong understanding of modern ML frameworks and infrastructure (e.g., PyTorch, TensorFlow, Hugging Face, LangChain)
- Strong knowledge of cloud-native architecture, distributed computing, and scalable model deployment
- Experience with fintech, data platforms, or large-scale information extraction systems preferred
- Contributions to the AI/ML research community (e.g., publications, patents, or open-source projects) are strongly preferred
- Proven success delivering AI-driven data extraction, enrichment, or document understanding systems at scale
Responsibilities
- Define and execute the AI & ML strategy for data collection, extraction, and enrichment automation aligned with PitchBook’s long-term data strategy
- Establish success metrics and operational KPIs for automation accuracy, throughput, and coverage improvement
- Lead, hire, and develop a high-performing global team of data scientists and ML engineers; define team structure, roles, and growth paths that align with organizational goals
- Elevate engineering excellence through code reviews, design reviews, and technical guidance for ML engineers and scientists
- Act as a multiplier by shaping best practices for experimentation, model evaluation, responsible AI, and scalable ML engineering
- Guide teams across the organization toward cohesive, reusable, and standards-aligned architectures
- Collaborate closely with Engineering, Product Management, and Data Operations to ensure the successful operationalization of AI/ML solutions into data pipelines and collection processes
Other
- Bachelor’s or Master’s degree in Computer Science, Mathematics, Data Science, or a related technical discipline (Master’s degree preferred)
- 12+ years of experience in machine learning, data science, or AI-focused engineering, including 7+ years leading technical teams; experience managing managers and geographically distributed teams is strongly preferred
- Must be authorized to work in the United States without the need for visa sponsorship now or in the future
- Excellent communication, collaboration, and influencing skills — including experience presenting to executive and cross-functional leadership
- A track record of fostering technical excellence and innovation across global, multidisciplinary teams