What this means
In a standard RAG pipeline, retrieval is stateless and single-pass. The query is embedded, the top-K chunks are returned and the language model generates from that context. If the initial retrieval missed a relevant document or the question required synthesising information from disparate sources, the system has no mechanism to compensate.
Agentic RAG adds a planning and evaluation layer — often implemented by a language model acting as an orchestrator — that can:
- Decompose a complex question into sub-queries and retrieve separately for each
- Evaluate whether retrieved chunks are sufficient to answer the question
- Reformulate the query and retrieve again if not
- Call different retrieval sources (knowledge bases, databases, external APIs) in sequence or in parallel
- Synthesise a final answer from the aggregated retrieved context
The technical grounding is described by Anthropic as a system that adds a "reasoning/planning loop on top of retrieval" — the agent decides what to retrieve, not just how to search once.
Why it matters for business
Many enterprise queries do not reduce to a single document lookup. A compliance officer asking "what are our current obligations under the Privacy Act for this customer scenario?" may need information from the Privacy Act summary, the organisation's data handling policy, the relevant consent records and recent OAIC guidance — each from a different source. Standard RAG makes one retrieval pass and hopes the right content surfaces. Agentic RAG plans a retrieval strategy, executes it across multiple sources and composes a complete answer.
BCG's 2025 research found that AI agents accounted for around 17% of total AI value in 2025 and are projected to reach 29% by 2028, reflecting increasing enterprise confidence in systems that can handle multi-step reasoning tasks.
How it works technically
A typical agentic RAG architecture includes:
Orchestrator (reasoning layer): A language model — often a capable reasoning model — that receives the original query, plans the retrieval strategy, executes sub-queries and evaluates whether the accumulated context is sufficient to generate a final answer.
Retrieval tools: The orchestrator calls retrieval tools as needed. These may include vector search over internal knowledge bases, keyword search, structured database queries or external API calls. Each retrieval tool is defined with a clear interface the orchestrator can invoke.
Query decomposition: For complex questions, the orchestrator breaks the query into sub-questions, retrieves separately for each and combines the results before final generation.
Self-evaluation: After each retrieval step, the orchestrator assesses whether the retrieved context is sufficient — a "critic" step. If not, it reformulates the query or retrieves from a different source.
Context window management: Across multiple retrieval steps, the accumulated context must be managed to fit within the language model's context window, often requiring summarisation of earlier retrieved content.
Practical implementation considerations
Agentic RAG is more complex to build, test and monitor than standard RAG. Each additional reasoning step introduces latency and cost — a system that makes three retrieval calls and two intermediate reasoning steps will take measurably longer and cost more per query than a standard single-pass retrieval.
The failure modes are also more complex. An error in the first sub-query can propagate through subsequent steps, producing a confident but compoundly wrong final answer. This makes evaluation harder: you need to test not just final answer quality but intermediate retrieval and reasoning steps.
For organisations considering agentic RAG, the practical guidance is to start with standard RAG for the majority of use cases and introduce agentic retrieval only for the specific query types that demonstrably require it — typically complex analytical questions, multi-source synthesis and tasks that require disambiguation before retrieval. Edison AI's AI implementation team typically designs hybrid pipelines where simple queries route to standard RAG and complex queries route to an agentic retrieval path, with query classification determining the routing decision.
Common mistakes
- Deploying agentic RAG for simple queries. The latency and cost overhead is not justified for questions that a single retrieval pass handles correctly.
- No guardrails on retrieval tool calls. Without limits on how many retrieval steps the agent can take, agentic systems can enter loops that consume tokens and time without converging on an answer.
- Insufficient evaluation of intermediate steps. Testing only the final answer misses failures in sub-query formulation or context synthesis.
- Allowing the agent to call unintended external sources. Retrieval tool definitions must be explicit and scoped. An agent with access to the open web will use it — and retrieve content outside your governance perimeter.
- Assuming agentic RAG solves data quality problems. Reasoning over a poor knowledge base still produces poor answers, regardless of how many retrieval passes are made.
What leaders should do next
Evaluate whether your current highest-value RAG use cases involve complex, multi-source questions or simple factual lookups. For the latter, standard RAG with hybrid search and re-ranking is usually sufficient. For the former, prototype an agentic retrieval path on a bounded, well-governed knowledge base. Measure latency, cost per query and answer quality against your standard RAG baseline before committing to an agentic architecture at scale.
Edison AI builds bespoke AI systems — including retrieval over your own documents — for Australian businesses.