OpenAI's Intelligence & Investigations (I2) team needs to establish a user-risk measurement function to understand and mitigate abuse and strategic risk in their AI products. This involves creating policy-grounded baselines, confidence intervals, and attribution to analyze how users interact with frontier AI and the impact of safety mitigations.
Requirements
- Are strong in sampling, inference, uncertainty quantification, probability theory, and rare-event estimation; comfortable with time-varying metrics:
- Write solid Python and SQL; are fluent with data warehouses and productionizing notebooks/pipelines:
- Nice to have: experience with Airflow DAGs or other ETL pipelines, Databricks, survival analysis, streaming/online detection, classifier evaluation/QA, privacy reviews/audit trails, or integrity/fraud/safety experience:
Responsibilities
- Define the measurement framework for user-level risk across products and cohorts: scope the questions that matter and align on clear, policy-grounded definitions
- Establish baselines and statistical confidence for core metrics: prevalence, intensity, trends, and cohort dynamics
- Build decision-ready reporting surfaces: executive dashboards, weekly briefs, and launch readouts that translate insights into action
- Clean and organize ambiguous data from disparate sources, with an eye toward building automated pipelines and systems
- Create attribution and change-tracking: connect shifts in user behavior to mitigations, product changes, and external events
- Partner across Safety Systems, Data Science, Integrity, Product, and Policy: ensure one coherent analytics entry point and consistent standards
- Uphold quality, privacy, and governance: document methods, ensure auditability, and maintain durable measurement hygiene
Other
- This role is based in San Francisco, CA (hybrid, 3 days/week): Relocation support is available
- Communicate crisply, translating complex estimators into clear actions for executives and cross-functional partners:
- Monitor signals for emerging risks and anomalies: recommend priorities that reduce harmful usage and improve user safety
- Communicate clearly and concisely: deliver insights and trade-offs to executives and engineering teams in language that drives decisions
- Have 3–6+ years in data science, measurement/causal inference, or risk analytics in high-stakes domains: