Session, short-term, long-term, and episodic memory for AI agents and chatbots
TL;DR
AI memory enables personalization and context retention across conversations. Session memory is raw message history, short-term is compressed summaries, long-term stores facts about entities, and episodic tracks timestamped events. Production systems need all four types.
Visual Overview
Without Memory
Without Memory
WITHOUT MEMORY┌───────────────────────────────────────────────────────────┐│││User (Monday): "I am allergic to shellfish" ││Agent: "Got it, I will note that." ││││User (Tuesday): "What should I order?" ││Agent: "The shrimp scampi is excellent!" ←FAILURE│││└───────────────────────────────────────────────────────────┘
Business cost of no memory:
Support: 40% of tickets are repeat issues (wasted agent time)
Sales: Lost context = lost deals ($50K avg deal, 15% close rate drop)
Product: Users churn when AI “forgets” them (12% higher churn)
Memory is not a feature. Memory is table stakes.
Memory Types
Memory Types
Memory Types
┌─────────────┬─────────────────────────┬─────────────────┐│TYPE│DEFINITION│BOUNDARY│├─────────────┼─────────────────────────┼─────────────────┤│SESSION│Raw message history│Clears on: ││ (Buffer) │ in current conversation │ • Session end││││ • Context limit│││Storage: Context window ││├─────────────┼─────────────────────────┼─────────────────┤│SHORT-TERM│Compressed/summarized│Triggers when: ││ (Working) │ version of session │ • Session > 4K ││││ • Explicit sum │││Storage: Prompt or cache│ Clears: Sess end│├─────────────┼─────────────────────────┼─────────────────┤│LONG-TERM│FACTS about entities│Persists until: ││ (Semantic) │ (user, org, domain) │ • Explicit upd ││││ • Contradiction │││Storage: Vector DB│ • TTL expiry │││││││ Examples: ││││ • "User prefers email" ││││ • "Company has 50 emps" ││││ • "Budget is $100K" ││├─────────────┼─────────────────────────┼─────────────────┤│EPISODIC│EVENTS that happened│Persists until: ││ (Temporal) │ (timestamped, queryable)│ • Retention pol ││││ • User deletion │││Storage: Vector DB + ││││ timestamp index ││││││││ Examples: ││││ • "On 3/15, user asked ││││ about refund" ││││ • "Last week, discussed ││││ pricing concerns" ││└─────────────┴─────────────────────────┴─────────────────┘
The Critical Distinction
Long-term vs Episodic
Long-term vs Episodic
LONG-TERM VS EPISODIC┌───────────────────────────────────────────────────────────┐│││LONG-TERM: "User is allergic to shellfish" ←FACT││ (no timestamp needed, always true) ││││EPISODIC: "On 3/15, user mentioned shellfish allergy" ││←EVENT (when it happened matters) ││││TEST: Can you answer "WHEN did you learn this?" ││ YES → Episodic │ NO → Long-term │││└───────────────────────────────────────────────────────────┘
Memory Operations
WRITE — When & What Gets Stored
Trigger
What to Store
Memory Type
User states fact
Extracted fact
Long-term
User states preference
Preference + confidence
Long-term
Conversation ends
Summary of key points
Short-term
Significant event
Event + timestamp
Episodic
Entity mentioned
Entity attributes
Long-term
Extraction prompt example:
Extraction Prompt
Extraction Prompt
From this conversation, extract:
1. Facts about the user (preferences, constraints, attributes)
2. Significant events (decisions made, problems discussed)
3. Action items or commitments
Format: {type: "fact"|"event", content: "...", confidence: 0-1}
READ — Retrieval Strategies
Strategy
How
When to Use
Recency
Last N memories
Continuation context
Relevance
Semantic similarity search
Topic-specific recall
Temporal
”Last week”, “In March”
Time-referenced query
Entity
All facts about X
Entity-focused task
Hybrid
Relevance + Recency boost
General retrieval
Retrieval prompt injection:
Retrieval Prompt Injection
Retrieval Prompt Injection
Context from memory:
• User preference: Prefers email communication (confidence: 0.9)
• Recent event: Discussed billing issue on 3/15 (resolved)
• Fact: Company size is 50 employees
[Rest of prompt...]
FORGET — Critical for Production
Mechanism
Trigger
Implementation
Explicit delete
User requests “forget X”
Hard delete + audit
Contradiction
New fact contradicts old
Update, keep history
Decay
Memory not accessed in N
Reduce retrieval weight
Consolidation
Many similar memories
Merge into summary
TTL
Retention policy expiry
Hard delete
GDPR request
”Right to be forgotten”
Full user purge
Memory Conflicts
Memory Conflicts
Memory Conflicts
THE PROBLEM┌───────────────────────────────────────────────────────────┐│││ January: User says "I hate spicy food" ││ March: User says "I love spicy food" ││││ Now what? │││└───────────────────────────────────────────────────────────┘RESOLUTION STRATEGIES┌───────────────────────────────────────────────────────────┐│││ 1. LAST WRITE WINS││ Simple: most recent fact replaces old ││ Risk: Loses nuance ("I am on a diet this month") ││││ 2. KEEP BOTH WITH TIMESTAMPS││Store both, retrieve most recent by default││ Allows: "You mentioned hating spicy food in Jan..." ││││ 3. ASK FOR CLARIFICATION││ "I see you mentioned both. Which is current?" ││Best UX but interrupts flow││││ 4. CONFIDENCE DECAY││ Older facts have lower confidence ││ Retrieve based on recency-weighted confidence│││└───────────────────────────────────────────────────────────┘
Architecture Patterns
Architecture Patterns
Architecture Patterns
SIMPLE: SESSION ONLY┌───────────────────────────────────────────────────────────┐│││ User → [Context Window] →LLM→ Response ││││ Pros: No infrastructure, works today││ Cons: Forgets everything between sessions││ Use for: Stateless Q&A, simple chatbots │││└───────────────────────────────────────────────────────────┘INTERMEDIATE: SUMMARIZATION┌───────────────────────────────────────────────────────────┐│││ User → [Recent messages + Summary of older] →LLM││││ When context fills: ││ 1. Summarize older messages││ 2. Keep summary + recent N messages ││││ Pros: Handles long conversations││ Cons: Loses detail, still per-session││ Use for: Long-form chat, customer support │││└───────────────────────────────────────────────────────────┘PRODUCTION: FULL MEMORY STACK┌───────────────────────────────────────────────────────────┐│││┌────────────────┐││ User ──────────→│Memory Layer│││└───────┬────────┘│││││┌───────────────┼───────────────┐│││││││┌─────▼─────┐┌─────▼─────┐┌─────▼─────┐│││ Session ││ Long-term ││ Episodic ││││ Buffer ││ Facts ││ Events │││└─────┬─────┘└─────┬─────┘└─────┬─────┘│││││││└───────────────┼───────────────┘│││││┌───────▼────────┐│││Retrieval│││└───────┬────────┘│││││┌───────▼────────┐│││LLM│││└────────────────┘│││└───────────────────────────────────────────────────────────┘
Implementation Checklist
Implementation Checklist
Implementation Checklist
MINIMUM VIABLE MEMORY┌───────────────────────────────────────────────────────────┐│││ [ ] Session persistence (Redis/DB, not just context) ││ [ ] Summary generation on session end ││ [ ] User fact storage (vector DB or KV store) ││ [ ] Retrieval on conversation start ││ [ ] Explicit forget mechanism │││└───────────────────────────────────────────────────────────┘PRODUCTION MEMORY┌───────────────────────────────────────────────────────────┐│││ [ ] All of above, plus: ││ [ ] Episodic memory with timestamps││ [ ] Conflict resolution strategy ││ [ ] Confidence scores on facts ││ [ ] Decay/consolidation for old memories ││ [ ] GDPR compliance (deletion, export) ││ [ ] Memory debugging UI for support ││ [ ] Metrics: retrieval latency, relevance scores │││└───────────────────────────────────────────────────────────┘
When This Matters
Situation
What to implement
Simple chatbot
Session buffer only
Customer support
+ Summaries + User facts
Sales assistant
+ Episodic (deal history matters)
Personal assistant
Full stack with long-term memory
Enterprise deployment
+ Compliance, audit, deletion
Multi-turn conversations
Session + summarization
Personalization
Long-term user preferences
”Remember when” queries
Episodic memory required
Interview Notes
💼50% of AI product interviews
Interview Relevance 50% of AI product interviews
🏭Essential for conversational AI
Production Impact
Powers systems at Essential for conversational AI
⚡12% higher churn without memory
Performance 12% higher churn without memory query improvement