Walmart is looking to define and implement the technical strategy for a next-gen AI-driven platform that moves beyond prompt engineering into modular reasoning systems, inventing and hardening intelligent systems that can reason, act, and adapt.
Requirements
- 5+ years in Python, building distributed systems at scale.
- Deep knowledge of agentic design patterns, including: ReAct, Plan-and-Execute, AutoGen-style coordination, Tool calling, dynamic agent routing, and recursive agent planning, Semantic memory, embedding-based context lookup, summarization windows
- Expertise in building LLM-based systems with LangChain, OpenAI, Anthropic, or custom orchestrators.
- Hands-on experience with RAG pipelines using vector stores (FAISS, Pinecone, Weaviate, Qdrant, Azure Cognitive Search)
- Hands-on experience with LLM evaluation and observability (tracing, token usage, agent state tracking)
- Hands-on experience with Workflow orchestration using config-first approaches (YAML/JSON definitions, step runners, etc.)
- Strong background in distributed systems, task queues, asynchronous workflows, and backend performance optimization.
Responsibilities
- Architect modular, testable, and composable Python systems that support multi-agent workflows, tool-chaining, RAG, memory management, and fallback strategies.
- Design LLM-powered execution engines that support both high throughput and adaptive reasoning (via LangChain, AutoGen, or custom frameworks).
- Lead implementation of retrieval-augmented generation (RAG) pipelines, semantic search, and structured knowledge memory systems.
- Build and scale integrations with internal LLMs, including handling signature-based auth, function calling, and context management at scale.
- Drive end-to-end lifecycle: from configuration schema (YAML) to execution trace logging, observability, and self-healing recovery patterns.
Other
- 8 + years of professional software engineering experience
- Proven ability to drive technical vision, resolve ambiguity, and make architectural tradeoffs at scale.
- Experience in cloud-native environments (AWS, GCP, or Azure), including containerization, monitoring, and secure API integrations.
- Built or contributed to a custom agentic orchestration framework used across multiple product lines.
- Deep understanding of how to apply LLM systems in regulated or high-compliance environments (PII handling, redaction, observability).