Improving the performance of advanced language models by evaluating and refining model outputs.
Requirements
- Experience with prompt engineering or AI tools is a plus, but not required
Responsibilities
- Create coding-relevant prompts reflective of real-world use cases
- Evaluate AI responses for adherence to formatting standards
- Draft expert-level “golden responses” that serve as benchmarks for model performance
- Flag errors, inconsistencies, or stylistic deviations in AI output
Other
- 2+ years of professional experience as a software engineer OR master’s degree in computer science
- Deep familiarity with domain-specific writing, formatting, and communication norms
- Strong analytical and written communication skills
- Detail-oriented with a commitment to quality and consistency
- Remote and asynchronous — work on your own schedule
- Expected commitment: 10+ hours/week
- Project duration: ~4 weeks with possible extension