Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Research Engineer - Evaluations

Canva

Salary not specified

Sep 12, 2025

San Francisco, CA, USA

At Canva, our mission is to empower the world to design. To ensure our generative AI models are truly helpful, we are seeking a talented Research Engineer to build our next-generation evaluation system by leveraging automatic evaluations.

Requirements

You have a strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures, with practical experience that informs robust evaluation strategies
You’ve successfully managed or optimized large-scale distributed model training across hundreds of GPUs
You have a solid understanding of machine learning, have worked with PyTorch and know how to optimize such codes for speed
You have disciplined coding practices, and are experienced with code reviews and pull requests.
You have experience working in cloud environments, ideally AWS
Familiarity with evaluation libraries and frameworks.
Experience building or working with agentic AI systems or multi-agent coordination.

Responsibilities

Design, build, and optimize the infrastructure for an "MLLM-as-a-Judge" evaluation system for scalable, automated feedback.
Implement and experiment with inference-time alignment techniques (Prompt Engineering, RAG, ICL) to directly improve model output quality.
Establish and manage a comprehensive benchmarking process to compare various foundation models on design-centric tasks.
Analyze evaluation data to identify model failure modes and provide actionable recommendations to the research team.
Collaborate with research scientists and ML engineers to integrate the agentic judge system into the model development lifecycle.
Translate the latest research in LLM evaluation and agentic AI into practical, production-ready engineering solutions.
Engineering autonomous AI agents that use Multimodal Large Language Models (MLLMs) to evaluate the quality, relevance, and human alignment of generated designs.

Other

high-impact role focuses on building the practical systems that make cutting-edge research effective, to provide a rapid feedback loop that guides the future of design generation at Canva, ultimately empowering millions of users to create.
Excel at creating data-driven evaluation methodologies, turning user analytics into clear, actionable insights.
A background or interest in human-computer interaction, design principles.