Intermediate

Memory

LLMs are stateless by default — they do not remember previous messages. LangChain's memory system adds conversation history to your chains, enabling multi-turn interactions.

Why Memory Matters

Without memory, every LLM call is independent. The model cannot reference earlier messages in the conversation. Memory solves this by injecting conversation history into each new prompt.

📚

Modern approach: In LangChain v0.3+, the recommended way to handle memory is to manage message history yourself and pass it to the prompt. The legacy ConversationBufferMemory classes still work but are being phased out in favor of explicit message management, especially with LangGraph.

ConversationBufferMemory

Stores the entire conversation history. Simple but can grow very large:

Python

from langchain.memory import ConversationBufferMemory
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
llm = ChatOpenAI(model="gpt-4o-mini")

chain = ConversationChain(llm=llm, memory=memory)

# First message
chain.invoke({"input": "My name is Alice."})

# Second message - the model remembers!
response = chain.invoke({"input": "What is my name?"})
print(response["response"])  # "Your name is Alice!"

ConversationBufferWindowMemory

Keeps only the last k exchanges to limit token usage:

Python

from langchain.memory import ConversationBufferWindowMemory

# Keep only the last 5 exchanges
memory = ConversationBufferWindowMemory(k=5)

# After 6 messages, the oldest one is dropped

ConversationSummaryMemory

Uses an LLM to summarize the conversation as it grows, keeping a compact representation:

Python

from langchain.memory import ConversationSummaryMemory

memory = ConversationSummaryMemory(llm=ChatOpenAI(model="gpt-4o-mini"))

# As conversation grows, memory maintains a running summary
# instead of storing every message
memory.save_context(
    {"input": "I'm building a RAG app"},
    {"output": "Great! RAG apps combine retrieval with generation..."}
)

print(memory.load_memory_variables({}))
# {'history': 'The human is building a RAG application...'}

ConversationEntityMemory

Tracks entities (people, places, concepts) mentioned in the conversation:

Python

from langchain.memory import ConversationEntityMemory

memory = ConversationEntityMemory(llm=ChatOpenAI(model="gpt-4o-mini"))

memory.save_context(
    {"input": "Alice works at Acme Corp as a data scientist"},
    {"output": "Interesting! Data science at Acme Corp."}
)

# Memory tracks entities automatically
print(memory.entity_store.get("Alice"))
# "Alice is a data scientist who works at Acme Corp"

VectorStoreMemory

Stores memories in a vector database and retrieves the most relevant ones based on the current query:

Python

from langchain.memory import VectorStoreRetrieverMemory
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

# Create a vector store for memories
vectorstore = Chroma(
    collection_name="memory",
    embedding_function=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

memory = VectorStoreRetrieverMemory(retriever=retriever)

# Stores each exchange as an embedding
# Retrieves the most relevant past conversations for new queries

Memory Comparison

Memory Type	Token Usage	Best For	Limitation
Buffer	Grows linearly	Short conversations	Hits token limits fast
Window (k)	Fixed (k messages)	Recent context only	Forgets old messages
Summary	Fixed (summary size)	Long conversations	Loses detail; extra LLM call
Entity	Per-entity storage	People/place tracking	Extra LLM call; entity-focused
Vector Store	Retrieval-based	Long-term memory	Requires vector database

Modern Approach — Manual Message History

The recommended approach in LangChain v0.3+ is to manage messages explicitly:

Python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

# Prompt with a placeholder for message history
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

model = ChatOpenAI(model="gpt-4o-mini")
chain = prompt | model

# Manage history yourself
history = []

# Turn 1
response = chain.invoke({"input": "My name is Alice", "history": history})
history.append(HumanMessage(content="My name is Alice"))
history.append(AIMessage(content=response.content))

# Turn 2 - model remembers because we pass history
response = chain.invoke({"input": "What is my name?", "history": history})
print(response.content)  # "Your name is Alice!"

Persisting Memory

Save conversation history to a file or database for persistence across sessions:

Python

import json
from langchain_core.messages import messages_to_dict, messages_from_dict

# Save messages to file
def save_history(messages, filepath):
    data = messages_to_dict(messages)
    with open(filepath, "w") as f:
        json.dump(data, f)

# Load messages from file
def load_history(filepath):
    with open(filepath, "r") as f:
        data = json.load(f)
    return messages_from_dict(data)

# Usage
save_history(history, "chat_history.json")
history = load_history("chat_history.json")

✅

For production apps: Use LangGraph's built-in checkpointing system for memory persistence. It handles serialization, storage, and thread management automatically. See the LangGraph lesson for details.

What's Next?

The next lesson covers RAG with LangChain — loading documents, creating embeddings, storing in vector databases, and building retrieval-augmented generation chains.

← Previous Chains Next → RAG with LangChain