Mercor is collaborating with a leading AI research team to advance DeepResearch-2-App pipelines that simulate real-world code generation tasks and needs senior-level software engineers to serve as independent evaluators and supervisors in this process
Requirements
- 6+ years of professional software engineering experience
- Deep specialization in backend or full-stack development, with testing and evaluation experience
- Strong ability to assess technical feasibility and debug complex systems
- Experience with Docker and automated testing frameworks
Responsibilities
- Review domain-generated prompts and assess their feasibility from a coding perspective
- Supervise model outputs and validate Docker file execution
- Design and implement 40–60 unit tests per evaluation set
- Review peer-generated unit tests for completeness and robustness
- Execute unit tests and confirm code performance and reliability
Other
- Remote and asynchronous — set your own schedule
- Estimated workload: ~20 hours per week
- Project-based contract, with ongoing need for evaluations
- Submit your resume to get started
- Complete a brief form to detail your technical expertise