Speechify needs a Software Engineer to join their AI team to improve data collection processes for model training, focusing on building high-quality datasets at petabyte-scale and low cost through infrastructure, engineering, and research integration.
Requirements
- Proficiency with bash/Python scripting in Linux environments
- Proficiency in Docker and Infrastructure-as-Code concepts
- Professional experience with at least one major Cloud Provider (we use GCP)
- Experience with web crawlers
- Experience with large-scale data processing workflows
Responsibilities
- Find new sources of audio data and bring it into our ingestion pipeline
- Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform.
- Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models.
- Craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products.
Other
- BS/MS/PhD in Computer Science or a related field.
- 5+ years of industry experience in software development.
- Ability to handle multiple tasks and adapt to changing priorities.
- Strong communication skills, both written and verbal.