Evaluating data quality and its return on investment to improve model performance on challenging benchmarks and workflows in the real-world for Fortune 500 companies and government institutions.
Requirements
- Strong grasp of LLMs and data-model dynamics.
- Knowledge of the latest trends in Generative AI and data that is useful for improving foundation models.
- Proven track record in benchmark development, model evaluation, or data-centric infrastructure.
- Familiarity with annotation workflows, validation processes, and scalable QA systems.
- Solid ML or data science foundation—able to reason about training impact from a data point-of-view.
- Experience with feedback-driven annotation loops and pre-delivery QA.
- Hands-on experience with taxonomy frameworks and structured data labeling.
Responsibilities
- Define and implement strategies to assess the ROI of data across training and fine-tuning pipelines.
- Build and maintain benchmarks that measure performance across key client and internal objectives.
- Develop systems and tooling for continuous data evaluation—measuring what matters, where it matters.
- Drive human-in-the-loop quality processes including pre-delivery validation and annotation feedback loops.
- Define and/or leverage comprehensive task taxonomy frameworks to structure data annotation efforts and improve training signal quality.
Other
- We are client first: We put our clients at the center of everything we do, because their success is the ultimate measure of our value.
- We work at Start-Up Speed: We move fast, stay agile and favor action because momentum is the foundation of perfection
- We are Al forward: We help our clients build the future of Al and implement it in our own roles and workflow to amplify productivity.
- Full-time remote opportunity
- Flexible working hours