RAG to Agents - From Retrieval to Action
Deep dive into AI agents: the agent loop, tools, ReAct pattern, memory systems, when agents are wrong, and agent failure modes you'll encounter in production
Why This Matters
RAG answers questions. Agents solve problems.
When a user asks “What’s the status of order #12345?”, RAG retrieves a document. But what if answering requires:
- Querying an order database
- Checking shipping status from an API
- Calculating estimated delivery based on location
- Composing a response with all that information
RAG can’t do this. RAG retrieves static documents. Agents take actions.
If you try to build multi-step systems with RAG patterns, you’ll create brittle pipelines that break on variation. Understanding the agent mental model lets you build flexible systems that adapt.
What Goes Wrong Without This:
Symptom: Your "smart assistant" can only answer questions from
documents. Users ask for actions, it apologizes.
Cause: You built RAG when you needed an agent. RAG retrieves
information. It doesn't take action or call APIs.
Symptom: Your multi-step pipeline is 500 lines of if/else handling
every edge case. Adding a new capability requires 2 weeks.
Cause: You hardcoded the reasoning that should be delegated to the LLM.
Every variation is a code branch.
Symptom: Your agent attempts an action, fails, and doesn't recover.
It returns "Error occurred" to the user.
Cause: You built a pipeline, not an agent. Pipelines don't adapt.
Agents observe results and adjust.
Pipelines vs Agents
There are two ways to build multi-step AI systems:
+------------------------------------------------------------------+
| PIPELINE (Code decides) |
+------------------------------------------------------------------+
| |
| Input → Step 1 → Step 2 → Step 3 → Output |
| ↓ ↓ ↓ |
| [fixed] [fixed] [fixed] |
| |
| The code determines what happens at each step. |
| Each branch is explicitly written. |
| Predictable, but rigid. |
| |
+------------------------------------------------------------------+
| AGENT (Model decides) |
+------------------------------------------------------------------+
| |
| Input → ┌─────────────────────────┐ |
| │ Observe current state │◄────────┐ |
| │ ↓ │ │ |
| │ Think: what next? │ │ |
| │ ↓ │ │ |
| │ Act: execute decision │─────────┘ |
| └─────────────────────────┘ |
| ↓ |
| Output (when done) |
| |
| The model determines what happens at each step. |
| Flexible, but less predictable. |
| |
+------------------------------------------------------------------+
The key question: Who decides the next step—your code or the model?
- Pipeline: You enumerate all paths. Reliable for known scenarios. Fails on novel scenarios.
- Agent: Model reasons about what to do. Handles variation. Can make mistakes.
Neither is better. They solve different problems.
The Agent Loop
An agent is a loop. The LLM decides what to do, executes it, observes the result, and decides again.
+------------------------------------------------------------------+
| THE AGENT LOOP |
+------------------------------------------------------------------+
| |
| ┌──────────────┐ |
| ┌──────▶│ OBSERVE │ |
| │ │ │ |
| │ │ What do I │ |
| │ │ know now? │ |
| │ └──────┬───────┘ |
| │ │ |
| │ ▼ |
| │ ┌──────────────┐ |
| │ │ THINK │ |
| │ │ │ |
| │ │ What should │ |
| │ │ I do next? │ |
| │ └──────┬───────┘ |
| │ │ |
| │ ▼ |
| │ ┌──────────────┐ |
| ┌──────┴──┐ │ ACT │ |
| │ Not done│◄───┤ │ |
| └─────────┘ │ Execute the │ |
| │ decision │ |
| └──────┬───────┘ |
| │ |
| ▼ |
| ┌─────────┐ |
| │ Done? │ |
| └────┬────┘ |
| │ Yes |
| ▼ |
| ┌─────────┐ |
| │ OUTPUT │ |
| └─────────┘ |
| |
+------------------------------------------------------------------+
Each iteration:
- Observe: What information do I have? What just happened?
- Think: Given my goal and current state, what’s the best next action?
- Act: Execute the chosen action
- Evaluate: Am I done? If not, loop.
The magic: the model decides the action at step 2. This is what makes it an agent, not a pipeline.
Tools: The Agent’s Capabilities
An agent without tools is just a chatbot. Tools are functions the agent can call.
+------------------------------------------------------------------+
| TOOLS GIVE AGENTS CAPABILITIES |
+------------------------------------------------------------------+
| |
| Tool Definition: |
| ┌───────────────────────────────────────────────────────────┐ |
| │ name: "search_orders" │ |
| │ description: "Search orders by user ID, order ID, │ |
| │ or date range" │ |
| │ parameters: │ |
| │ user_id: string (optional) │ |
| │ order_id: string (optional) │ |
| │ date_from: date (optional) │ |
| └───────────────────────────────────────────────────────────┘ |
| |
| Agent receives tool descriptions → LLM learns WHEN to use |
| Agent receives user query → LLM decides WHICH tool + arguments |
| Tool returns result → Agent observes and continues |
| |
+------------------------------------------------------------------+
Common tool categories:
| Category | Examples | What it enables |
|---|---|---|
| Data retrieval | search_docs, query_database | Access information |
| External APIs | get_weather, check_inventory | Real-time data |
| Actions | send_email, create_ticket | Side effects |
| Computation | calculate, run_code | Complex logic |
| User interaction | ask_user, show_options | Clarification |
Tool descriptions are prompts. Good descriptions = agent uses tools correctly. Bad descriptions = agent guesses wrong.
RAG as a Tool
Here’s the insight: RAG doesn’t get replaced by agents—it becomes a tool.
Available tools:
• search_docs: Search knowledge base for relevant info
• query_orders: Get order details from database
• check_shipping: Get real-time shipping status
User: "When will my order #789 arrive?"
Agent thinks: "I need order details first"
Agent acts: query_orders(order_id="789")
Observation: {status: "shipped", carrier: "FedEx", ...}
Agent thinks: "Order is shipped, need tracking info"
Agent acts: check_shipping(tracking="FX123456")
Observation: {location: "Chicago", est_delivery: "Dec 5"}
Agent thinks: "I have all the info, can answer now"
Agent responds: "Your order shipped via FedEx and
should arrive December 5th."
RAG is retrieval. Agents can use retrieval as one capability among many.
The ReAct Pattern
The most common agent architecture is ReAct (Reason + Act). The model explicitly reasons before acting.
+------------------------------------------------------------------+
| ReAct EXAMPLE |
+------------------------------------------------------------------+
| |
| User: "What's the refund status for alice@example.com?" |
| |
| ┌─────────────────────────────────────────────────────────┐ |
| │ THOUGHT: I need to find any refund requests from this │ |
| │ user. I'll search the orders first. │ |
| │ │ |
| │ ACTION: search_orders(email="alice@example.com") │ |
| └─────────────────────────────────────────────────────────┘ |
| |
| OBSERVATION: [ |
| {order_id: "456", status: "refund_requested", amount: 99} |
| ] |
| |
| ┌─────────────────────────────────────────────────────────┐ |
| │ THOUGHT: Found an order with refund requested. │ |
| │ I need to check the refund processing status. │ |
| │ │ |
| │ ACTION: check_refund(order_id="456") │ |
| └─────────────────────────────────────────────────────────┘ |
| |
| OBSERVATION: {status: "approved", processed_date: "Dec 3"} |
| |
| ┌─────────────────────────────────────────────────────────┐ |
| │ THOUGHT: The refund has been approved and processed. │ |
| │ I have enough info to answer. │ |
| │ │ |
| │ ACTION: respond_to_user │ |
| └─────────────────────────────────────────────────────────┘ |
| |
| RESPONSE: "Alice's refund of $99 for order #456 was |
| approved and processed on December 3rd." |
| |
+------------------------------------------------------------------+
The THOUGHT step makes the agent’s reasoning visible. This helps with:
- Debugging (you can see why it chose an action)
- Guidance (you can provide examples of good reasoning)
- Error recovery (model realizes when it’s stuck)
Agent Memory
Agents without memory forget everything between turns. Production agents need memory.
+------------------------------------------------------------------+
| MEMORY TYPES |
+------------------------------------------------------------------+
| |
| SHORT-TERM MEMORY (Conversation Context) |
| ──────────────────────────────────────── |
| What: Previous messages in current session |
| How: Append to LLM context |
| Limit: Context window size |
| |
| User: "Check order #123" |
| Agent: "Order #123 shipped Dec 1" |
| User: "When will IT arrive?" ← "it" = order #123 |
| Short-term memory resolves the reference |
| |
+------------------------------------------------------------------+
| LONG-TERM MEMORY (Persistent Knowledge) |
+------------------------------------------------------------------+
| What: Facts that persist across sessions |
| How: Vector store for semantic retrieval |
| Limit: Storage capacity |
| |
| Session 1: User says "I prefer email over SMS" |
| → Store: ("user_preference", "prefers email for notifications") |
| |
| Session 2: Agent needs to notify user |
| → Retrieve preference → Send email |
| |
+------------------------------------------------------------------+
| WORKING MEMORY (Scratch Pad) |
+------------------------------------------------------------------+
| What: Intermediate results during task execution |
| How: Structured state object |
| Limit: Task complexity |
| |
| Task: "Calculate total revenue by region" |
| Working memory: { |
| "north": 150000, |
| "south": 120000, ← Accumulated as agent works |
| "east": pending... |
| } |
| |
+------------------------------------------------------------------+
Without memory, agents can’t handle multi-turn conversations, learn user preferences, or maintain context across sessions.
When Agents Are Wrong
Agents are not always the answer. Sometimes they’re the problem.
+------------------------------------------------------------------+
| WHEN TO USE WHAT |
+------------------------------------------------------------------+
| |
| USE DIRECT LLM CALL when: |
| • Single-step task (summarize, translate, classify) |
| • No external data needed |
| • No actions required |
| |
| USE RAG when: |
| • Answer exists in your documents |
| • Single retrieval + generation is sufficient |
| • You want predictable, auditable answers |
| |
| USE PIPELINE when: |
| • Steps are known and fixed |
| • High reliability required |
| • Each step must happen regardless of previous results |
| |
| USE AGENT when: |
| • Task requires multiple tools/data sources |
| • Strategy depends on intermediate results |
| • User requests vary significantly |
| • Recovery from failure requires reasoning |
| |
+------------------------------------------------------------------+
The “agent for everything” anti-pattern:
User: "What's 2 + 2?"
BAD (over-engineering):
Agent thinks: "I should use the calculator tool"
Agent acts: calculate("2 + 2")
Observation: 4
Agent responds: "The answer is 4"
Cost: Multiple LLM calls, tool overhead
Time: 2-3 seconds
GOOD (direct):
LLM responds: "4"
Cost: One LLM call
Time: 200ms
Agents add:
- Latency: Multiple LLM calls per request
- Cost: Each thought/action cycle costs tokens
- Non-determinism: Same input can produce different paths
- New failure modes: Wrong tool selection, hallucinated arguments, infinite loops
Don’t use an agent when a simpler approach works.
Agent Failure Modes
Agents introduce new ways to fail:
+------------------------------------------------------------------+
| AGENT-SPECIFIC FAILURES |
+------------------------------------------------------------------+
| |
| 1. WRONG TOOL SELECTION |
| Agent picks search_docs when it should use query_orders |
| Cause: Ambiguous tool descriptions, poor examples |
| |
| 2. HALLUCINATED ARGUMENTS |
| Agent calls: check_order(order_id="MADE_UP_ID") |
| Cause: Model invents plausible-looking arguments |
| |
| 3. INFINITE LOOPS |
| Agent keeps trying the same failing action |
| Cause: No loop detection, poor error handling instructions |
| |
| 4. PREMATURE TERMINATION |
| Agent responds before gathering enough information |
| Cause: Weak instructions to be thorough |
| |
| 5. SCOPE CREEP |
| Agent takes actions beyond what user asked |
| Cause: Unclear boundaries, model being "helpful" |
| |
| 6. CATASTROPHIC ACTIONS |
| Agent deletes data, sends emails, makes purchases |
| Cause: Powerful tools without guardrails |
| |
+------------------------------------------------------------------+
Code Example
Minimal agent loop demonstrating the observe-think-act cycle:
from openai import OpenAI
import json
client = OpenAI()
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "search_orders",
"description": "Search for orders by user email or order ID",
"parameters": {
"type": "object",
"properties": {
"email": {"type": "string", "description": "User email"},
"order_id": {"type": "string", "description": "Order ID"},
},
},
},
},
{
"type": "function",
"function": {
"name": "check_refund",
"description": "Check refund status for an order",
"parameters": {
"type": "object",
"properties": {
"order_id": {"type": "string", "description": "Order ID"},
},
"required": ["order_id"],
},
},
},
]
# Mock tool implementations
def search_orders(email=None, order_id=None):
return [{"order_id": "456", "status": "refund_requested", "amount": 99}]
def check_refund(order_id):
return {"status": "approved", "processed_date": "Dec 3"}
def execute_tool(name, arguments):
"""Route tool calls to implementations."""
if name == "search_orders":
return search_orders(**arguments)
elif name == "check_refund":
return check_refund(**arguments)
return {"error": f"Unknown tool: {name}"}
def run_agent(user_message: str, max_iterations: int = 5) -> str:
"""Run the agent loop."""
messages = [
{"role": "system", "content": "You are a helpful customer service agent."},
{"role": "user", "content": user_message},
]
for i in range(max_iterations):
# THINK: Model decides what to do
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
message = response.choices[0].message
# Check if done (no tool calls)
if not message.tool_calls:
return message.content
# ACT: Execute each tool call
messages.append(message)
for tool_call in message.tool_calls:
name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute tool
result = execute_tool(name, arguments)
# OBSERVE: Add result to context
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result),
})
return "Max iterations reached"
# Test
result = run_agent("What's the refund status for alice@example.com?")
print(result)
Key Takeaways
1. Pipelines vs Agents
- Pipeline: code decides the next step
- Agent: model decides the next step
2. The agent loop: Observe → Think → Act → Repeat
3. Tools give agents capabilities
- Good tool descriptions are prompts
- RAG becomes a tool, not a replacement
4. ReAct pattern: explicit reasoning before acting
- THOUGHT → ACTION → OBSERVATION
5. Memory types: short-term, long-term, working memory
6. Agents aren't always the answer
- Add latency, cost, non-determinism
- Use simpler approaches when they suffice
7. Agent-specific failure modes
- Wrong tool, hallucinated arguments, infinite loops
- Premature termination, scope creep, catastrophic actions
Verify Your Understanding
Before proceeding:
Explain the difference between a pipeline and an agent to someone who hasn’t read this document. If you say “an agent uses an LLM,” that’s insufficient.
Given this task: “Summarize the top 3 news articles about AI today”
- Could this be done with RAG?
- When would this need an agent?
- What tools would the agent need?
Your agent has these tools: [search_docs, query_database, send_email, calculate]. User asks: “What’s our revenue this quarter?” Which tool(s) should the agent use? What if query_database fails?
Identify the error in this statement: “I built an agent with 30 tools so it can handle any request.”
What’s Next
After this, you can:
- Continue → Agents → Evaluation — measuring what matters in multi-step systems
- Build → Production agent with proper guardrails
Go Deeper: Production Agents
This article covers the agent mental model. For production patterns (idempotency, checkpointing, HITL, cost control), see the Production Agents Deep Dive series:
| Part | Topic | What You’ll Learn |
|---|---|---|
| 0 | Overview | Why 98% of orgs haven’t deployed agents at scale |
| 1 | Idempotency | Safe retries, the Stripe pattern |
| 2 | State & Memory | Checkpointing, memory systems |
| 3 | Human-in-the-Loop | Confidence routing, escalation |
| 4 | Cost Control | Token budgets, circuit breakers |
| 5 | Observability | Silent failure detection |
| 6 | Durable Execution | Temporal, Inngest, Restate |
| 7 | Security | Sandboxing, prompt injection |
| 8 | Testing | Golden datasets, evaluation |