Memory
LLMs are stateless by default — they do not remember previous messages. LangChain's memory system adds conversation history to your chains, enabling multi-turn interactions.
Why Memory Matters
Without memory, every LLM call is independent. The model cannot reference earlier messages in the conversation. Memory solves this by injecting conversation history into each new prompt.
ConversationBufferMemory classes still work but are being phased out in favor of explicit message management, especially with LangGraph.ConversationBufferMemory
Stores the entire conversation history. Simple but can grow very large:
from langchain.memory import ConversationBufferMemory from langchain_openai import ChatOpenAI from langchain.chains import ConversationChain memory = ConversationBufferMemory() llm = ChatOpenAI(model="gpt-4o-mini") chain = ConversationChain(llm=llm, memory=memory) # First message chain.invoke({"input": "My name is Alice."}) # Second message - the model remembers! response = chain.invoke({"input": "What is my name?"}) print(response["response"]) # "Your name is Alice!"
ConversationBufferWindowMemory
Keeps only the last k exchanges to limit token usage:
from langchain.memory import ConversationBufferWindowMemory # Keep only the last 5 exchanges memory = ConversationBufferWindowMemory(k=5) # After 6 messages, the oldest one is dropped
ConversationSummaryMemory
Uses an LLM to summarize the conversation as it grows, keeping a compact representation:
from langchain.memory import ConversationSummaryMemory memory = ConversationSummaryMemory(llm=ChatOpenAI(model="gpt-4o-mini")) # As conversation grows, memory maintains a running summary # instead of storing every message memory.save_context( {"input": "I'm building a RAG app"}, {"output": "Great! RAG apps combine retrieval with generation..."} ) print(memory.load_memory_variables({})) # {'history': 'The human is building a RAG application...'}
ConversationEntityMemory
Tracks entities (people, places, concepts) mentioned in the conversation:
from langchain.memory import ConversationEntityMemory memory = ConversationEntityMemory(llm=ChatOpenAI(model="gpt-4o-mini")) memory.save_context( {"input": "Alice works at Acme Corp as a data scientist"}, {"output": "Interesting! Data science at Acme Corp."} ) # Memory tracks entities automatically print(memory.entity_store.get("Alice")) # "Alice is a data scientist who works at Acme Corp"
VectorStoreMemory
Stores memories in a vector database and retrieves the most relevant ones based on the current query:
from langchain.memory import VectorStoreRetrieverMemory from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings # Create a vector store for memories vectorstore = Chroma( collection_name="memory", embedding_function=OpenAIEmbeddings(), ) retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) memory = VectorStoreRetrieverMemory(retriever=retriever) # Stores each exchange as an embedding # Retrieves the most relevant past conversations for new queries
Memory Comparison
| Memory Type | Token Usage | Best For | Limitation |
|---|---|---|---|
| Buffer | Grows linearly | Short conversations | Hits token limits fast |
| Window (k) | Fixed (k messages) | Recent context only | Forgets old messages |
| Summary | Fixed (summary size) | Long conversations | Loses detail; extra LLM call |
| Entity | Per-entity storage | People/place tracking | Extra LLM call; entity-focused |
| Vector Store | Retrieval-based | Long-term memory | Requires vector database |
Modern Approach — Manual Message History
The recommended approach in LangChain v0.3+ is to manage messages explicitly:
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain_core.messages import HumanMessage, AIMessage # Prompt with a placeholder for message history prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), MessagesPlaceholder(variable_name="history"), ("human", "{input}"), ]) model = ChatOpenAI(model="gpt-4o-mini") chain = prompt | model # Manage history yourself history = [] # Turn 1 response = chain.invoke({"input": "My name is Alice", "history": history}) history.append(HumanMessage(content="My name is Alice")) history.append(AIMessage(content=response.content)) # Turn 2 - model remembers because we pass history response = chain.invoke({"input": "What is my name?", "history": history}) print(response.content) # "Your name is Alice!"
Persisting Memory
Save conversation history to a file or database for persistence across sessions:
import json from langchain_core.messages import messages_to_dict, messages_from_dict # Save messages to file def save_history(messages, filepath): data = messages_to_dict(messages) with open(filepath, "w") as f: json.dump(data, f) # Load messages from file def load_history(filepath): with open(filepath, "r") as f: data = json.load(f) return messages_from_dict(data) # Usage save_history(history, "chat_history.json") history = load_history("chat_history.json")
What's Next?
The next lesson covers RAG with LangChain — loading documents, creating embeddings, storing in vector databases, and building retrieval-augmented generation chains.
Lilly Tech Systems