Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

UniversalAGI Logo

Founding LLM Evaluation Researcher

UniversalAGI

Salary not specified
Aug 27, 2025
Remote, US
Apply Now

UniversalAGI is seeking an exceptional Founding LLM Evaluation Researcher to build comprehensive evaluation frameworks, stay at the forefront of AI research, design and execute rigorous experiments to evaluate autonomous agents, and develop innovative methodologies to enhance agent performance and capabilities in real-world deployments.

Requirements

  • Design comprehensive LLM evaluation frameworks from scratch for AI automation in government and enterprise environments
  • Build evaluation systems to measure and improve AI solution performance across production deployments
  • Develop evaluation methodologies for multi-agent systems operating autonomously in real-world applications
  • Optimize LLM outputs for specific enterprise use cases involving both structured databases and unstructured document repositories
  • Develop methodologies to improve model response accuracy and relevance for domain-specific applications
  • Bridge research findings into production-ready platform capabilities with robust evaluation metrics
  • Implement and conduct rigorous evaluation experiments to optimize agent performance and reliability

Responsibilities

  • Design comprehensive LLM evaluation frameworks from scratch for AI automation in government and enterprise environments
  • Build evaluation systems to measure and improve AI solution performance across production deployments
  • Develop evaluation methodologies for multi-agent systems operating autonomously in real-world applications
  • Optimize LLM outputs for specific enterprise use cases involving both structured databases and unstructured document repositories
  • Develop methodologies to improve model response accuracy and relevance for domain-specific applications
  • Bridge research findings into production-ready platform capabilities with robust evaluation metrics
  • Implement and conduct rigorous evaluation experiments to optimize agent performance and reliability

Other

  • Stay current with cutting-edge research by reading and synthesizing findings from top-tier AI conferences and journals
  • Design and execute data collection strategies to build high-quality evaluation datasets tailored to specific use cases
  • Collaborate closely with product engineers to translate research advancements into practical applications and deployable solutions
  • Work with enterprise clients to understand evaluation requirements and success metrics
  • Document and communicate research findings through internal presentations, reports, and potentially external publications or conferences