Why RAG is not enough?

0 Comments

Why RAG Is Not Enough — And What Comes Next

Based on my own development experience.

Retrieval-Augmented Generation (RAG) has become a popular method for building AI-based chatbots, especially because it allows developers to harness the power of large language models (LLMs) without training them from scratch. But as more real-world use cases emerge, the limitations of standard RAG systems are becoming clearer.

In this post, I’ll talk:

  1. What RAG is and how it works
  2. The limitations of traditional RAG
  3. How Agentic RAG overcomes these limitations
  4. What you need to build an Agentic RAG system

1. What Is RAG?

RAG stands for Retrieval-Augmented Generation. It’s a framework or method that enables you to build an AI chatbot without training your own model. Instead, you use a pre-trained LLM (like GPT-4, Mistral, or Llama) and feed it custom data at runtime by retrieving relevant documents from a database.

🔧 Core Components of a RAG System:

  • LLM: A large language model (pre-trained)
  • Embedding Model: Converts user queries and documents into vectors
  • Vector Database: Stores document embeddings (e.g., ChromaDB, FAISS, Weaviate)
  • Custom Data: Your knowledge base (e.g., PDFs, product docs)

⚙️ How It Works:

RAG diagram

  1. The user submits a query
  2. The query is converted to an embedding or vector representation by retrieval(embedding model)
  3. The embedding is compared to the data in database
  4. The most relevant documents are retrieved
  5. The LLM uses these documents as context to generate a response

2. Limitations of RAG

Despite its usefulness, RAG has some serious limitations that prevent it from functioning as a truly intelligent system:

  • No decision-making: The model can’t decide when or what to retrieve — it just retrieves every time.
  • No tool use: It cannot interact with APIs, call databases, or use calculators.
  • Hallucinations: Even with retrieved documents, the LLM might still fabricate information.

3. Enter Agentic RAG

To solve these issues, we need something smarter — Agentic RAG. This builds on top of RAG by adding decision-making and tool use capabilities, turning the system into a more autonomous agent.

🤖 What Makes Agentic RAG Different?

  • Decides when to retrieve data
  • Can call tools, such as APIs, databases, or plugins
  • Follows workflows and can chain multiple actions

Example: When a user asks "What’s the weather in PhnomPenh tomorrow?", the agent can call a weather API instead of just searching static documents.


4. What You Need to Build Agentic RAG

To move from RAG to Agentic RAG, your system will need:

🧠 LLM with Tool Calling

  • Must support function calling (e.g., GPT-4, Claude, or open source model such as Qwen, DeepSeek....)

🛠️ Tool Definitions

  • Tools could be:
    • API connectors
    • SQL query tools
    • Python calculators
    • Web search tools

🗄️ Databases

  • Vector Databases: For similarity search
  • Relational Databases: For structured, tabular data

🧩 Orchestration Layer

  • Tools like LangChain, LangGraph, or custom logic help manage:
    • Tool usage
    • Memory
    • Multi-step reasoning

Example of Agentic RAG in a Real Use Case

Agentic RAG diagram

The diagram above illustrates an Agentic RAG architecture for an online electronics store, where product information is stored in two databases — a vector database and a relational database.

How it works:

  1. A customer submits a query in the chatbot.
  2. The agent decides whether to respond directly or call a tool.
    • Example: If the customer simply says, “Hi,” the agent replies directly without accessing any database.
  3. If the query requires data lookup, the agent calls the relevant tool to retrieve information:
    • Company-related queries (e.g., “Where is your nearest store?”) are answered directly from the vector database, which contains general company knowledge and FAQs.
    • Product-specific queries (e.g., “What’s the price of the iPhone 15 Pro?”) trigger an API call to the relational database, which holds up-to-date product details such as prices and stock levels.
      If the retrieved data doesn’t match the question’s intent, the LLM reformulates the query and sends it back to the query node for another search.
  4. The LLM then uses both the question and the retrieved data to produce a final answer for the customer.

5. Conclusion

RAG was a big step forward in building intelligent chatbots. But Agentic RAG is what takes chatbots from text responders to true assistants. By enabling decision-making, tool use, and dynamic behavior, you're building not just a chatbot — but an AI agent.

There are many different types of Agentic RAG architectures, each with its own strengths and applications. This article has covered the high-level concept and core components, but if you want to dive deeper into the various approaches and design patterns, check out this detailed research paper.