Grafana Labs is seeking an experienced Staff Software Engineer to advance the company's AI-driven observability features by focusing on evaluating and improving Generative AI systems, particularly Large Language Models (LLMs).
Requirements
- Proven experience designing and implementing evaluation frameworks for AI/ML systems
- Strong understanding of prompt engineering and structured output evaluation techniques
- Experience managing context windows in LLM systems
- Ability to develop and maintain automated evaluation pipelines and tooling
- Familiarity with dataset management and best practices in AI/ML model assessment
Responsibilities
- Design and implement robust evaluation frameworks for Generative AI and LLM-based systems, including test sets, regression tracking, and output verification methods
- Develop tooling to facilitate automated, low-friction evaluation processes for model outputs, prompts, and agent behaviors
- Define, refine, and implement metrics that accurately reflect product goals and operational constraints
- Lead dataset management initiatives, ensuring data quality and relevance for evaluation purposes
- Collaborate with engineering teams to integrate evaluation pipelines into CI/CD workflows
- Guide teams across Grafana in adopting best practices for GenAI evaluation and benchmarking
- Monitor and analyze model performance, providing insights to improve AI features and reduce operational toil
Other
- Excellent collaboration skills to work across teams and translate goals into actionable criteria
- High degree of autonomy and problem-solving skills
- Bachelor’s degree or higher in Computer Science, Data Science, or related field; advanced degrees preferred