Amigo aims to build AI that puts people first, focusing on healthcare to ensure AI systems are reliable and trustworthy before deployment. This role specifically addresses the need for robust, highly available systems that handle critical conversations and workflows, where failures have direct and significant impacts on people's lives and business operations.
Requirements
- 7+ years building mission-critical systems where downtime directly impacts business operations or user safety
- Battle-tested experience with real-time communication systems, network resilience, and fault-tolerant architectures
- Proven track record leading technical decisions for systems serving millions of users under unpredictable conditions
- Deep experience integrating with legacy enterprise systems and handling the complexity of customer environments
- History of building systems that gracefully handle network failures, API limitations, and third-party service outages
- Experience with incident response, post-mortem analysis, and building systems that learn from failures
- Strong background in performance optimization for real-time systems under variable network and load conditions
Responsibilities
- Architect and build bulletproof real-time communication systems that work reliably across poor network conditions
- Design graceful degradation patterns so our systems provide value even when customer networks or infrastructure fail
- Build customer system integrations that handle the chaos of legacy systems, API rate limits, and unexpected downtime
- Implement comprehensive error handling, retry logic, and fallback mechanisms for mission-critical workflows
- Design embedded applications that integrate seamlessly into customer environments we don't control
- Build monitoring and alerting that catches production issues before customers notice them
- Create deployment strategies that enable zero-downtime updates across diverse customer environments
Other
- Lead incident response and build systems that learn from failures to prevent future issues
- Make architectural decisions balancing feature velocity with production reliability and customer trust
- Leadership experience mentoring engineers through complex production challenges and architectural trade-offs
- Understanding of regulatory compliance requirements and how they impact system architecture decisions
- Strong judgment for balancing technical debt, feature velocity, and production reliability under business pressure