The client is looking to support large language model (LLM) evaluation and dataset development by evaluating, refining, and validating AI-generated code across multiple programming languages to improve model performance, reliability, and scalability.
Requirements
- Demonstrated professional experience in software engineering, typically gained over multiple years in production environments.
- Strong experience building and deploying full-stack or backend systems using modern programming languages and frameworks.
- Solid understanding of software architecture, system design, debugging, testing, and code review best practices.
- Ability to clearly document evaluation decisions and communicate technical reasoning.
- Experience working with large-scale or production-grade systems is preferred.
Responsibilities
- Curate high-quality code examples and develop accurate reference solutions in languages such as Python, JavaScript (including React), C/C++, Java, Rust, and Go.
- Evaluate and refine AI-generated code for efficiency, scalability, maintainability, and correctness.
- Identify error patterns and build automated mechanisms to verify code quality.
- Design validation strategies to assess AI model performance across different stages of the software engineering lifecycle, including architecture design, API development, implementation, testing, deployment, and maintenance.
- Collaborate with researchers and engineering stakeholders to improve AI-driven coding systems and benchmarks.
Other
- Ability to overlap a portion of working hours with Pacific Time
- Ability to work as an independent contractor in the United States, United Kingdom, Canada, Europe, Singapore, Dubai, or Australia
- Flexible, approximately 10–40 hours per week
- Initial term of approximately 1 month, with potential extensions based on performance and project needs
- Completion of an initial application form, participation in an automated AI-led interview, and completion of a coding-based assessment as part of the evaluation process