Deepgram is looking for a Data Scientist (Voice AI) to take ownership of how they benchmark and evaluate the performance of their voice AI models, ensuring the integrity and impact of their AI offerings.
Requirements
- Experience designing, executing, and iterating on evaluation pipelines for ML models
- Proficiency in Python and data analysis libraries
- Ability to develop automated evaluation systems—whether scripting analysis workflows or integrating with broader ML pipelines.
- Comfort working with large-scale datasets and crafting meaningful performance metrics and visualizations.
- Experience using LLMs or internal tooling to accelerate analysis, QA, or pipeline prototyping.
- Prior experience evaluating speech-related models, especially STT or TTS systems.
- Familiarity with model documentation formats (e.g., model cards, eval reports, dashboards).
Responsibilities
- Build and maintain scalable benchmarking pipelines for model evaluations across STT, TTS, and voice agent use cases.
- Run regular evaluations of production and pre-release models on curated, real-world datasets.
- Partner with Research, Data, and Engineering teams to develop new evaluation methodologies and integrate them into our development cycle.
- Design, define and refine evaluation metrics that reflect product experience, quality, and performance goals.
- Author comprehensive model cards and internal reports outlining model strengths, weaknesses, and recommended use cases.
- Work closely with Data Labeling Ops to source, annotate, and prepare evaluation datasets.
- Collaborate with QA Engineers to integrate model tests into CI/CD and release workflows.
Other
- Partner cross-functionally with research, product, QA, marketing, and data labeling to shape how our models are measured, released, and improved.
- Support Marketing and Product with credible, data-backed comparisons to competitors.
- Track market developments and maintain awareness of competitive benchmarks.
- Support GTM teams with benchmarking best practices for prospects and customers.
- Strong communication skills—especially when translating raw data into structured insights, documentation, or dashboards.