Advanced
Memory Systems
Memory is what transforms a generic chatbot into a personal assistant. By remembering your preferences, past conversations, and important context, your assistant becomes more useful and personalized over time.
Types of Memory
| Memory Type | Duration | Content | Storage |
|---|---|---|---|
| Conversation | Single session | Current conversation messages | In-memory array |
| Short-term | Hours to days | Recent interactions, current tasks | Cache or database |
| Long-term | Permanent | User profile, preferences, facts | Database, vector store |
| Episodic | Permanent | Past conversations, events, decisions | Vector store with metadata |
| Semantic | Permanent | Knowledge, facts, domain information | RAG system, knowledge base |
User Profile Memory
The most impactful memory for a personal assistant is a user profile that grows over time:
- Preferences: Communication style (brief vs detailed), work hours, timezone, dietary restrictions, travel preferences
- Relationships: Key contacts with context (manager's name, spouse, frequent collaborators)
- Ongoing projects: What the user is currently working on, deadlines, collaborators
- Past decisions: Choices the user has made that inform future recommendations
- Corrections: When the user corrects the assistant, store the correction to avoid repeating mistakes
Automatic memory extraction: The best approach is to have the LLM automatically identify information worth remembering during conversations. After each conversation, run a secondary pass that extracts facts, preferences, and important context into structured memory.
Implementing Conversation Memory
For conversations that exceed the context window, you need a strategy:
- Sliding window: Keep the most recent N messages in context. Simple but loses early context.
- Summarization: Periodically summarize older messages into a condensed form. Balances context retention with token usage.
- Retrieval-augmented: Store all messages in a vector database. Retrieve the most relevant past messages for each new query.
- Hybrid: Keep recent messages in full, summarize medium-term history, and use retrieval for long-term history.
Vector Store for Long-Term Memory
Vector databases enable semantic search across all past interactions:
Python - Memory Storage and Retrieval
# Store a memory def save_memory(text, metadata): embedding = embed_model.encode(text) vector_db.upsert( id=generate_id(), vector=embedding, metadata={ "text": text, "type": metadata["type"], # "preference", "fact", "event" "timestamp": datetime.now().isoformat(), **metadata } ) # Retrieve relevant memories def recall(query, top_k=5): embedding = embed_model.encode(query) results = vector_db.query(vector=embedding, top_k=top_k) return [r.metadata["text"] for r in results]
Memory Management
- Memory importance scoring: Not all memories are equally valuable. Score memories by frequency of access, recency, and relevance.
- Conflict resolution: When new information contradicts old memories, update rather than duplicate. "User prefers tea" should replace "User prefers coffee."
- Privacy controls: Give users the ability to view, edit, and delete specific memories. Provide a "forget this" command.
- Memory decay: Reduce the weight of old, unaccessed memories over time to keep the most relevant information prominent.
Start simple: Begin with a basic user profile stored as a JSON file that gets appended to the system prompt. Add vector search and sophisticated memory management only when you have enough conversation history to make it valuable.
Lilly Tech Systems