Welo Data works with technology companies to provide datasets that are high-quality, ethically sourced, relevant, diverse, and scalable to supercharge their AI models. Help us elevate our clients' Data at Welo Data. In this role, you will be instrumental in refining and evaluating large language models (LLMs). You'll design prompts, create high-quality datasets, and perform rigorous analysis to directly improve the functionality, accuracy, and safety of cutting-edge AI systems. Your expertise will help us build smarter, more reliable, and more helpful technology.
Requirements
- Proven experience in a role involving AI data annotation, content quality review, search quality rating, or prompt engineering.
- Ability to interpret code, datasets, and system workflows at a conceptual level (no coding required).
- Familiarity with data annotation platforms and model evaluation tools.
- Direct experience with generative AI tools for text, voice, or video.
- Background in QA testing, rubric design, or AI safety and ethics evaluation.
Responsibilities
- Design, test, and iteratively refine complex and creative prompts to enhance AI model capabilities in reasoning, instruction following, and contextual understanding.
- Conduct rigorous side-by-side (SxS) comparisons of AI-generated outputs, providing detailed rationales and quality ratings to identify the superior response.
- Develop "golden" datasets, ideal responses, and granular evaluation rubrics to serve as benchmarks for model training and performance analysis.
- Engineer adversarial prompts and red-team scenarios to systematically identify model vulnerabilities, biases, and safety gaps across various policies (e.g., Harassment, Hate Speech, Dangerous Content).
- Create, annotate, and review diverse datasets across text, audio, and video formats to support model training and localization.
- Perform in-depth fact-checking and analysis to ensure model responses are accurate, relevant, and grounded in reliable sources.
- Analyze model outputs to identify trends, document error patterns, and categorize failures, providing actionable feedback to engineering teams.
Other
- Bachelor's degree or equivalent experience in Linguistics, Computational Linguistics, Communications, Technical Writing, or a related analytical field.
- Native or near-native fluency in English with exceptional writing and editorial skills.
- A highly detail-oriented and analytical mindset, with the ability to deconstruct complex instructions and evaluate outputs with precision.
- Ability to work independently and manage workflows effectively in a remote environment.
- Multilingual proficiency in one or more languages in addition to English.