OpenAI is looking to solve the problem of designing deeply personal, multimodal experiences that make advanced AI feel natural, useful, and human, and to create reliable, insightful metrics to measure model and product quality across the full stack.
Requirements
- Hands-on experience building tools or pipelines around LLMs or multimodal models
- Proficient in Python for backend/data workflows
- Familiar with TypeScript/React or similar frameworks for visualization
- Experience with evaluation or visualization of multimodal models (speech, vision, or sensors)
- Familiarity with hardware prototyping or embedded ML
- Background in human-in-the-loop evaluation or UX research tooling
Responsibilities
- Design and implement extensible evaluation harnesses for multimodal tasks spanning speech, vision, and text
- Build interactive visualization and analysis tools that help engineers, designers, and researchers inspect model and UX performance
- Empower product and design teams to define and extend evaluation suites aligned with real-world usage and product vision
- Automate continuous evaluation and regression tracking to ensure each model and hardware iteration improves the experience
- Collaborate with hardware, software, research, and design teams to turn qualitative goals into quantitative evaluation metrics
Other
- 4 days per week onsite in San Francisco, CA
- Relocation assistance to new employees
- Equal opportunity employer, with no discrimination on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic
- Committed to providing reasonable accommodations to applicants with disabilities
- Must be able to protect computer hardware entrusted to you from theft, loss or damage and maintain the confidentiality of proprietary, confidential, and non-public information