The Amazon Artificial General Intelligence (AGI) Data Services organization is looking to develop diverse datasets to train and evaluate the Amazon AI models.
Requirements
- Experience with language data annotation systems and other forms of data markup
- Proficient with scripting languages, such as Python
- Experience working with speech, text, and multimodal data in multiple languages
- Expertise in bootstrapping AI data collections for quickly evolving requirements
- Extensive experience working with speech, text, and multimodal data in multiple languages
- Practical experience with Machine Learning
- Familiarity with technical concepts such as APIs
Responsibilities
- Design complex data collections with human participants in response to science needs: author instructions, define and implement quality targets and mechanisms, provide day-to-day coordination of data collection efforts (including planning, scheduling, and reporting), and be responsible for the final deliverables
- Design and conduct complex data creation tasks using synthetic and model-based data generation methods, following state-of-the-art approaches
- Analyze and extract insights from large amounts of data
- Build tools or tool prototypes for data analysis or data creation, using Python or another scripting language
- Use modeling tools to bootstrap or test new AI functionalities
- Collaborate with scientists, software engineers, and other data creators to evaluate performance of AI models
Other
- Master's or higher degree in a relevant field (Computational Linguistics or equivalent field with computational analysis)
- 2+ years experience in computational linguistics or language data processing or AI data creation
- Excellent communication, strong organizational skills and very detailed oriented
- Comfortable working in a fast paced, highly collaborative, dynamic work environment
- Able to think creatively and possess strong analytical and problem solving skills