The Safety Systems team at OpenAI is focused on ensuring the safety, robustness, and reliability of AI models and their deployment. This role aims to establish a data-driven approach to understand, evaluate, and monitor the safety of production systems, addressing emerging safety issues and developing fundamental solutions for safe deployment of advanced models and future AGI.
Requirements
- Expertise in defining and implementing metrics, with a track record of operationalizing new feature and product-level metrics from scratch
- Strong statistical background, including knowledge of sampling, regression, causal analysis, and more
- Demonstrated prior experience in NLP, large language models, or generative AI
- Develop and implement statistical methods necessary to operationalize safety-related metrics
- Create and disseminate dashboards, reports, and tools that enable the team and company to answer safety-related questions independently
- Uncovering new ways to improve our approaches to measuring and mitigating harm and abuse
- Establish a data-driven culture within Safety Systems by driving the definition, tracking, and operationalizing of feature-, product-, and company-level metrics
Responsibilities
- Establish the data-driven approach for understanding, evaluating, and monitoring the safety of our production systems
- Define north-star metrics
- Own and implement the statistical methods to productionize those metrics
- Conduct analysis to understand the impact of our products
- Establish source-of-truth dashboards that the entire company can use to answer safety-related questions
- Develop safety data flywheel and provide safety research with production insights/data for training and evaluation
- Lead our efforts in understanding and measuring the real-world safety impacts of OpenAI’s current and upcoming products
Other
- 5+ years experience in a quantitative role navigating highly ambiguous environments, ideally as a founding data scientist or team lead at a hyper-growth product company or research org
- Proven leadership skills, including leading multiple data scientists and cross-functional teams
- Excellent communication skills with demonstrated ability to communicate with product managers, engineers, and executives alike
- Strategic insights that extend beyond traditional statistical significance testing
- Experience in trust and safety, integrity, anti-abuse, or related fields