Agent Memory Systems NEW#
As agents tackle longer, more complex tasks, managing memory becomes critical. An agent without proper memory loses context, repeats mistakes, and cannot learn from experience. This page covers the taxonomy of agent memory, implementation patterns, and production tooling for building persistent, context-aware agents.
Learning Objectives#
Understand the taxonomy of agent memory types
Distinguish working memory, semantic memory, episodic memory, and procedural memory
Implement memory patterns in LangGraph (checkpoints and cross-thread stores)
Know production memory tools (Mem0, LangGraph Store)
Recognize and avoid common memory anti-patterns
1. Why Memory Matters#
Without memory, agents are “goldfish” — each interaction starts fresh. Every conversation turn is isolated, every mistake is repeated, and every user preference must be restated. Memory enables:
Continuity across conversation turns within a session
Learning from past interactions and errors
Personalization based on user history and preferences
Long-running task resumption after interruptions or failures
The result is an agent that feels less like a stateless API and more like a knowledgeable colleague.
2. Memory Taxonomy#
Agent memory is not monolithic. Different types of memory serve different purposes and require different storage strategies.
graph TD
M[Agent Memory] --> W[Working Memory\nToken-level / Context Window]
M --> S[Semantic Memory\nLong-term Facts]
M --> E[Episodic Memory\nPast Experiences]
M --> P[Procedural Memory\nHow-To / Workflows]
W --> W1[In-context messages]
W --> W2[Managed via context engineering]
S --> S1[Vector database]
S --> S2[Key-value store]
E --> E1[Interaction logs]
E --> E2[Success/failure records]
P --> P1[Stored workflows]
P --> P2[Learned skills]
2.1 Token-Level (Working Memory)#
The context window itself. Everything the model can “see” right now.
Scope: Current conversation turn and recent history
Limit: 8K–1M tokens depending on the model
Lifetime: Lost when the conversation ends (unless persisted via checkpointing)
Managed by: Context engineering — summarization, truncation, selective inclusion
Working memory is the foundation, but it is finite. All other memory types exist to extend what the agent effectively knows beyond the context window.
2.2 Semantic Memory (Long-Term Facts)#
Persistent knowledge stored in vector databases or key-value stores.
Contains: User preferences, learned facts, domain knowledge, entity relationships
Survives: Across sessions and restarts
Retrieved by: Semantic similarity search or direct key lookup
Examples: “User prefers concise summaries”, “Company X uses SAP ERP”
Semantic memory allows an agent to accumulate knowledge over time without stuffing facts into every prompt.
2.3 Episodic Memory (Past Experiences)#
Records of past interactions — what happened, what worked, what failed.
Contains: Conversation summaries, action logs, task outcomes, error records
Survives: Across sessions
Retrieved by: Temporal lookup or semantic similarity
Examples: “Last week the user asked about Q3 variance; we flagged a data quality issue”
Episodic memory enables “lessons learned” behavior. An agent that remembers a past failure can avoid repeating it.
2.4 Procedural Memory (How-To)#
Stored workflows and procedures the agent has learned or been given.
Contains: Step-by-step workflows, conditional decision trees, tool usage patterns
Survives: Indefinitely; versioned
Retrieved by: Task classification and intent matching
Examples: “When the user requests a financial summary, fetch GL data → reconcile → format → attach”
Procedural memory allows agents to acquire and refine skills over time, turning ad-hoc task completion into repeatable, optimized processes.
3. Implementation Patterns#
3.1 LangGraph Thread Checkpoints (Working Memory Persistence)#
LangGraph checkpoints persist conversation state across turns within a thread. Each thread maintains its own isolated state, enabling multi-user deployments without memory bleed.
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
# Each thread has its own conversation state
async with AsyncPostgresSaver.from_conn_string(db_url) as saver:
graph = workflow.compile(checkpointer=saver)
config = {"configurable": {"thread_id": "user-123-session-456"}}
# State persists across turns within the same thread_id
result = await graph.ainvoke(
{"messages": [HumanMessage(content="What did we discuss last time?")]},
config=config,
)
Use thread checkpoints when you need turn-by-turn continuity within a single conversation. The thread ID acts as the conversation identifier — different thread IDs are fully isolated.
3.2 LangGraph Store (Cross-Thread Memory)#
LangGraph’s store provides a shared namespace that persists across threads and sessions. It is the right layer for user preferences, long-term facts, and anything that should outlive a single conversation.
from langgraph.store.memory import InMemoryStore
from langgraph.store.postgres import AsyncPostgresStore
# Use InMemoryStore for development, AsyncPostgresStore for production
store = InMemoryStore()
graph = workflow.compile(checkpointer=saver, store=store)
# Inside a node — store.put writes to the shared namespace
def remember_preference(state, config, *, store):
user_id = config["configurable"]["user_id"]
store.put(
("user", user_id), # namespace: tuple acting as a path
"preference", # key
{"language": "en", "format": "concise"}, # value
)
# Inside another node — store.get reads back
def apply_preference(state, config, *, store):
user_id = config["configurable"]["user_id"]
pref = store.get(("user", user_id), "preference")
language = pref.value.get("language", "en") if pref else "en"
return {"language": language}
Namespaces are tuples that create logical separation (e.g., ("user", user_id) vs ("project", project_id)). This prevents key collisions across different entity types.
3.3 Vector Store Memory (Semantic Retrieval)#
For large-scale semantic memory, store memories as embeddings and retrieve by similarity. This scales to millions of memories per user without overwhelming the context window.
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_postgres import PGVector
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = PGVector(
embeddings=embeddings,
collection_name="agent_memories",
connection=db_url,
)
# Save a memory
memory_text = "User prefers Vietnamese language for audit report summaries"
vector_store.add_documents([
Document(
page_content=memory_text,
metadata={
"user_id": "123",
"type": "preference",
"created_at": "2026-04-15",
},
)
])
# Recall relevant memories for the current task
memories = vector_store.similarity_search(
"language preference for reports",
k=5,
filter={"user_id": "123"},
)
# Inject memories into agent context
memory_context = "\n".join(m.page_content for m in memories)
The key discipline here is selective storage — save only surprising, durable, or high-value information. Storing every turn degrades retrieval quality and inflates costs.
3.4 Episodic Memory with Summarization#
Rather than storing raw transcripts, summarize interactions before saving. This compresses episodic memory and makes retrieval more effective.
async def save_episode(thread_id: str, messages: list, outcome: str, store):
"""Summarize and save a completed interaction as an episode."""
summary_prompt = f"""
Summarize this agent interaction in 2-3 sentences.
Focus on: what was asked, what actions were taken, and what the outcome was.
Note any errors, user corrections, or lessons learned.
Outcome: {outcome}
"""
summary = await llm.ainvoke(messages + [HumanMessage(content=summary_prompt)])
store.put(
("episodes", thread_id),
f"episode_{int(time.time())}",
{
"summary": summary.content,
"outcome": outcome,
"timestamp": time.time(),
},
)
4. Mem0: Production Memory Platform#
Mem0 is a dedicated memory layer for AI applications, published at ECAI 2025. It abstracts the complexity of memory extraction, storage, and retrieval into a single SDK.
Key capabilities:
Automatic extraction: Identifies memorable facts from raw conversation text without manual annotation
Semantic deduplication: Updates existing memories instead of creating duplicates
Conflict resolution: When new information contradicts a stored memory, Mem0 applies an update-or-replace strategy
Graph memory: Models relationships between entities (experimental, moving to production in 2026)
Procedural memory: Stores and retrieves learned workflows (v1.0.0, 2026)
from mem0 import Memory
m = Memory()
# Save memories from a conversation — extraction is automatic
messages = [
{"role": "user", "content": "I prefer summaries in bullet points, not paragraphs."},
{"role": "assistant", "content": "Noted! I will use bullet points going forward."},
]
m.add(messages, user_id="user-123")
# Retrieve relevant memories before responding
memories = m.search("formatting preference", user_id="user-123")
# Returns: [{"memory": "User prefers bullet point summaries", "score": 0.92}]
Mem0 is appropriate when you want managed memory with minimal implementation overhead. For fine-grained control or custom storage backends, use LangGraph Store directly.
5. Memory Anti-Patterns#
Anti-Pattern |
Problem |
Fix |
|---|---|---|
Storing everything |
Context pollution and high retrieval costs |
Store only surprising, durable, or high-value information |
No memory decay |
Stale memories override current user state |
Add TTL fields or confidence scores that decay over time |
No conflict resolution |
Contradictory memories confuse agent responses |
Use update-or-replace, never blind append |
Mixing memory types |
Working memory fills with long-term facts |
Separate stores: checkpoints for in-session, store/vector for long-term |
Flat memory namespace |
Key collisions across users or projects |
Use structured namespaces: |
No retrieval filter |
Memory from other users leaks into responses |
Always filter by |
6. Memory in Production#
Building production memory systems requires decisions beyond the code layer.
Privacy and Compliance#
GDPR right to deletion: AI memories about a person are personal data. Implement a
delete_user_memories(user_id)path from day one.Data minimization: Only store what is necessary for the agent’s purpose. Audit stored memory periodically.
Consent: Be transparent with users about what is being remembered and for how long.
Staleness and Decay#
Memories become outdated. A preference recorded six months ago may no longer reflect the user’s current needs.
Mitigation strategies:
Add
created_atandlast_accessedtimestamps to every memory recordApply confidence decay: reduce relevance score of memories older than a threshold
Prompt the agent to verify critical memories before acting on them
Scale and Index Design#
At production scale, memory retrieval latency matters.
Use approximate nearest neighbor (ANN) indexes (e.g., HNSW in pgvector) for vector retrieval
Partition by
user_idto keep indexes small and queries fastCache frequently accessed memories (user preferences, static facts) in a fast key-value layer (Redis/Valkey)
Evaluation#
Memory quality is hard to measure. Useful metrics:
Metric |
Description |
|---|---|
Recall@K |
Does the correct memory appear in the top-K retrieved results? |
Precision@K |
What fraction of the top-K retrieved memories are actually relevant? |
Task improvement rate |
Does memory usage measurably improve task success rate? |
Memory freshness |
What percentage of stored memories are within their confidence threshold? |
Evaluate memory independently from the agent’s reasoning. A memory system that retrieves incorrect or outdated facts will degrade agent quality even if the reasoning itself is sound.
Summary#
graph LR
A[New Interaction] --> B{Memory Type?}
B -->|Current turn| C[Working Memory\nContext window]
B -->|User fact / preference| D[Semantic Memory\nVector store / KV]
B -->|What happened| E[Episodic Memory\nSummarized log]
B -->|How to do X| F[Procedural Memory\nWorkflow store]
C --> G[LangGraph Checkpoints]
D --> H[LangGraph Store\nor Mem0]
E --> H
F --> H
Effective agent memory combines all four types:
Working memory (checkpoints) for within-session continuity
Semantic memory (vector store) for durable facts and preferences
Episodic memory (summarized logs) for learning from past interactions
Procedural memory (workflow store) for skill acquisition over time
The discipline is knowing what to store, how long to keep it, and when to retrieve it — not storing everything indefinitely.
Further Reading#
LangGraph Foundations & State Management — LangGraph state and checkpointing
Human-in-the-Loop & Persistence — Interrupting agents for human verification
Multi-Agent Collaboration — Memory sharing across agent teams
Observability: LangFuse & LangSmith — Monitoring memory retrieval quality in production