Replit is looking to improve its AI agent, the core of its product strategy, by defining success metrics, designing experiments, and analyzing data to drive improvements in agent effectiveness and user outcomes.
Requirements
- Deep experimentation expertise: A/B testing, experiment design, power analysis, handling skewed data, interpreting results beyond p-values.
- Strong SQL skills and experience designing data models for high-volume event data; experience with dbt or similar transformation tools.
- Proficiency in Python and data science libraries (pandas, scipy, statsmodels, etc.).
- Experience with LLM or AI agent evaluation—understanding of prompt-response patterns, agent evaluation frameworks, or model quality measurement.
- Experience with modern data stack (BigQuery, dbt, Fivetran, Segment, Hex).
- Familiarity with experimentation platforms (LaunchDarkly, Statsig, Eppo, or similar).
- Experience with causal inference methods (difference-in-differences, synthetic control, CUPED).
Responsibilities
- Design and analyze experiments to measure agent improvements—from model changes to UX variations—with statistical rigor and practical tradeoffs.
- Define success metrics that connect agent trace data (prompts, responses, code changes, execution outcomes) to user outcomes like successful deploys, retention, and revenue.
- Build the semantic layer for agent data in partnership with data engineering—defining the tables, metrics, and models that enable self-serve analysis across the AI team.
- Surface insights from trace analysis that identify failure modes, successful patterns, and opportunities to improve agent effectiveness.
- Partner with AI engineering, product, and leadership to translate data into roadmap decisions; you'll have a seat at the table for critical agent strategy discussions.
- Create dashboards and reporting that surface agent performance metrics (task completion, latency, quality scores, user satisfaction) for the AI team and executives.
Other
- 5+ years of experience in data science, analytics, or a quantitative role with a focus on product, growth, or experimentation.
- Ability to translate ambiguous questions into structured analysis and communicate findings clearly to both technical and non-technical stakeholders.
- Bias toward action: you ship insights that influence decisions, not just dashboards.
- Background in high-growth SaaS or PLG companies with large-scale event data.
- Understanding of developer tools or software engineering workflows.