Uber's Customer Obsession team is looking to architect, productionize, and scale an autonomous support agent that resolves customer issues end-to-end, pushing the state of the art in GenAI for customer service.
Requirements
- Deep expertise in LLM-driven systems (inference optimization, prompt/program design, fine-tuning, distillation/LoRA, safety/guardrails, evals).
- Strong software engineering in Python plus one of Go/Java/C++; hands-on with microservices, gRPC/HTTP, cloud infra, containers, CI/CD, and real-time telemetry/observability.
- Demonstrated ownership of high-availability services (SLO/SLA design, incident response, on-call leadership, postmortems).
- Track record of shipping customer-facing intelligent experiences with measurable impact (A/B testing, metrics literacy).
- Voice agent background (ASR/TTS streaming, barge-in, endpointing, telephony, WebRTC) and conversational quality/NLP evaluation.
- Agentic architectures in production (planner/executor, memory, multi-step reasoning) and RAG over complex, policy-heavy knowledge bases.
- Practical expertise balancing speed and reliability at scale: experiment frameworks, feature flags, canary/guarded rollouts, and clear kill-switches.
Responsibilities
- Own the end-to-end agent architecture: agentic planning and execution loops, long-term memory, persona/voice, knowledge routing, and policy enforcement for compliant, on-brand conversations.
- Ship production systems that handle millions of conversations with rigorous SLOs, fallbacks, and canaries; design graceful degradation (e.g., human handoff) and safety guardrails (prompt-injection, jailbreak, PII redaction).
- Lead voice agent initiatives: Drive the development of Uber's voice support agent-covering real-time speech recognition (ASR), text-to-speech, natural turn-taking (barge-in and endpointing), and reliable telephony/WebRTC integration.
- Advance retrieval & reasoning: Build next-generation retrieval and reasoning pipelines, where the agent can search across different knowledge sources, apply policy-driven tools, and call structured workflows and ensure that responses are consistently grounded.
- Establish evals that matter: offline rubrics, simulated scenarios, safety tests, cost/latency tradeoff suites, and LLM-as-judge (with calibrated human review) wired into CI/CD and experiment platforms.
- Drive automation at scale: partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy to better customer experience and reduce cost per contact.
- Mentor/principal-lead multiple pods; set technical strategy and quality bars; coach senior engineers on agentic patterns, reliability, and experiment velocity.
Other
- 10+ years building production ML/AI systems; 4+ years leading complex ML initiatives end-to-end.
- Experience with voice agents and agentic architectures is a major plus.
- We value candidates with bias for action who get creative with GenAI tools to accelerate execution and experimentation.
- Partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy.
- Mentor/principal-lead multiple pods; set technical strategy and quality bars; coach senior engineers on agentic patterns, reliability, and experiment velocity.