Codex is OpenAI’s first-party developer product focused on agentic software engineering. We’re building tools that help engineers design, write, test, and ship code faster—safely and at scale. We partner tightly with research and product to translate model advances into tangible developer productivity. As a Data Scientist on Codex, you will measure and accelerate product-market fit for AI developer tools. You’ll define what “developer productivity” means for our product, run experiments on new coding models and UX, and pinpoint where the model helps or hurts across languages and tasks. Your insights will directly shape how an entire industry builds software.
Requirements
- Fluency in SQL and Python; comfort with experiment design and causal inference
- Strong programming background; ability to prototype, run simulations, and reason about code quality
- Familiarity with IDE/extension telemetry or developer tooling analytics
- Prior experience with NLP/LLMs, code models, or evaluations for generative coding
Responsibilities
- Design and interpret A/B tests and staged rollouts of new coding models and product features
- Define and operationalize metrics such as suggestion acceptance, edit distance, compile/test pass rates, task completion, latency, and session productivity
- Build dashboards and analyses that help the team self-serve answers to product questions (by language, framework, repo size, task type)
- Diagnose failure modes and partner with Research on targeted improvements (model quality signals, user feedback, evals)
Other
- Embed with the Codex product team to discover opportunities that improve developer outcomes and growth
- Experience defining product metrics tied to user value
- Ability to communicate clearly with PM, Eng, and Design—and to influence product direction
- 5+ years in a quantitative role at a developer-facing or high-growth product
- This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.