Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Applied Researcher, Audio Generation

Cartesia

Salary not specified

Sep 15, 2025

San Francisco, CA, US

Cartesia is looking to develop the next generation of AI that is ubiquitous, interactive, and can continuously process and reason over large streams of audio, video, and text, even on-device. The specific role aims to develop next-generation speech models for tasks like multi-lingual text-to-speech (TTS), voice conversion, music generation, and sound effect synthesis, with a focus on near-zero latency and precise creative control.

Requirements

Proven experience in developing and training novel generative models, preferably for audio or speech.
Clear understanding of the architectural trade-offs between model quality, inference speed, and memory footprint.
Hands-on experience with model conditioning and control mechanisms.

Responsibilities

Develop & optimize speech and audio models for production.
Work with engineering to ship and scale your models across our target platforms: cloud, on-premise, and on-device.
Develop model architectures and inference strategies specifically for low-latency, real-time performance on consumer hardware.
Implement and refine mechanisms for fine-grained controllability, allowing for the manipulation of attributes like speaker identity, emotion, prosody, and acoustic style.
Pioneer the latest research on new architectures for generative modeling.

Other

We’re an in-person team based out of San Francisco.
Relocation and immigration support.