Build the core intelligence behind our platform, creating agents that can listen, speak, navigate interfaces, and interact naturally with users in real-time, solving real problems by balancing innovation with pragmatic constraints.
Requirements
- Production experience with LLMs (OpenAI, Anthropic, or open-source models)
- Hands-on work with speech AI (STT/TTS systems like Deepgram, ElevenLabs, Whisper)
- Experience with browser automation (Playwright, Puppeteer, Selenium) or computer vision
- Strong Python skills with async programming and real-time systems
- Understanding of prompt engineering, retrieval systems, and agent frameworks
- Ability to debug complex AI behaviors and build observability tools
- Software engineering fundamentals for production AI systems
Responsibilities
- Build and optimize voice AI systems using speech-to-text and text-to-speech models
- Design browser agents that navigate, understand, and interact with web applications
- Implement browser automation with computer vision and DOM understanding
- Engineer prompt systems and LLM workflows for consistent, intelligent behavior
- Create evaluation frameworks to measure voice quality, agent accuracy, and user experience
- Integrate multimodal AI - combining voice, vision, and language understanding
- Build real-time AI pipelines where latency and reliability are critical
Other
- We sponsor visas for qualified candidates.
- Visa sponsorship available