Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

ML Evals Engineer

Open AI

Salary not specified

Oct 14, 2025

San Francisco, CA, US

OpenAI is looking to solve the problem of designing deeply personal, multimodal experiences that make advanced AI feel natural, useful, and human, and to create reliable, insightful metrics to measure model and product quality across the full stack.

Requirements

Hands-on experience building tools or pipelines around LLMs or multimodal models
Proficient in Python for backend/data workflows
Familiar with TypeScript/React or similar frameworks for visualization
Experience with evaluation or visualization of multimodal models (speech, vision, or sensors)
Familiarity with hardware prototyping or embedded ML
Background in human-in-the-loop evaluation or UX research tooling

Responsibilities

Design and implement extensible evaluation harnesses for multimodal tasks spanning speech, vision, and text
Build interactive visualization and analysis tools that help engineers, designers, and researchers inspect model and UX performance
Empower product and design teams to define and extend evaluation suites aligned with real-world usage and product vision
Automate continuous evaluation and regression tracking to ensure each model and hardware iteration improves the experience
Collaborate with hardware, software, research, and design teams to turn qualitative goals into quantitative evaluation metrics

Other

4 days per week onsite in San Francisco, CA
Relocation assistance to new employees
Equal opportunity employer, with no discrimination on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic
Committed to providing reasonable accommodations to applicants with disabilities
Must be able to protect computer hardware entrusted to you from theft, loss or damage and maintain the confidentiality of proprietary, confidential, and non-public information