Bloomberg needs to enhance its data products by integrating cutting-edge AI technologies, specifically focusing on Natural Language Processing (NLP) enrichments for client communications. This involves structuring unstructured data to improve search, classification, summarization, and insight generation, ultimately fueling intelligent downstream applications and decision-making for clients.
Requirements
- 4+ years of experience working in AI/ML data roles, ideally focused on NLP, communications, or information extraction.
- Proven experience with annotation programs, dialogue labeling, or large-scale training/evaluation dataset development.
- Strong grasp of data modeling, schema design, and best practices for structuring unstructured data.
- Familiarity with search infrastructure and summarization models, and how data influences relevance ranking and response generation.
- Demonstrated ability to design, scale, and govern data pipelines that support high-impact ML model training and evaluation.
- Comfort engaging with ML practitioners to co-design data schemas and evaluate performance trade-offs.
- Knowledge of Python, SQL, and common ML/NLP tooling.
Responsibilities
- Own the end-to-end Annotation Lifecycle, from schema development to annotation execution, with an eye toward ML performance and product utility.
- Design and manage annotation programs for search and summarization use cases, including training data for relevance ranking, query-document matching, and text abstraction.
- Develop scalable strategies for data labeling and dialogue annotation, tailored for NLP enrichments across communication products.
- Shape and evolve schematic structures and data models that serve as the foundation for annotation quality and reuse.
- Define metadata structures and enrichment tags that help interpret communication context, intent, and relevance to user queries.
- Collaborate with ML engineers and product stakeholders to align annotation efforts with model requirements and product goals.
- Drive quality and consistency across annotation processes by developing clear guidelines, validation metrics, and governance frameworks.
Other
- Bridge the gap between finance and AI/ML by mastering domain-specific concepts that elevate communications experiences.
- Excellent project management skills and the ability to manage competing priorities across multiple stakeholders.
- Stay current on trends in search technologies, summarization architectures, and best practices for building reliable training datasets in these domains.
- Serve as a domain expert in data structuring, labeling, and ML data design within communications-focused NLP use cases.
- Experience working with annotation tools or platforms (e.g., Prodigy, Labelbox, Snorkel, etc.).