Memory & Planning
Memory gives agents continuity across actions. Planning gives them strategy. Together, they transform a simple tool-calling loop into an intelligent system that can handle complex, multi-step tasks.
Memory Types
Short-Term Memory (Conversation Buffer)
The simplest form of memory: keep the full conversation history in the LLM's context window. Every message, tool call, and result stays in context.
- Implementation: Append each message to a list; send the full list with each LLM call
- Advantage: Complete context, no information loss
- Limitation: Context window eventually fills up for long tasks
- Best for: Short to medium tasks within context limits
Working Memory (Task Scratchpad)
A structured space where the agent tracks the current task state: what has been done, what remains, intermediate results, and current hypotheses.
class WorkingMemory: def __init__(self): self.plan = [] # Current plan steps self.completed = [] # Completed steps self.findings = {} # Key findings by topic self.current_step = 0 # Where we are in plan self.errors = [] # Errors encountered def to_prompt(self): """Format working memory for inclusion in prompt.""" return f"""Current Task State: - Plan: {self.plan} - Completed: {self.completed} - Step: {self.current_step}/{len(self.plan)} - Key Findings: {self.findings} - Errors: {self.errors}"""
Long-Term Memory (Persistent Storage)
Information that persists across sessions. The agent can recall past experiences, learned facts, and user preferences.
- Vector store: Store embeddings of past conversations and results. Retrieve relevant memories using semantic search.
- Entity memory: Track information about specific entities (users, projects, documents) in a structured database.
- Summary memory: Periodically summarize conversation history and store summaries for future reference.
Memory Implementations
Buffer Memory
Keep the last N messages or last N tokens. Simple but loses old context.
Summary Memory
Use the LLM to periodically summarize the conversation so far, replacing old messages with a compact summary. Preserves key information while staying within context limits.
Vector Store Memory
from chromadb import Client class VectorMemory: def __init__(self): self.client = Client() self.collection = self.client.create_collection("agent_memory") def store(self, text, metadata=None): """Store a memory with auto-generated embedding.""" self.collection.add( documents=[text], metadatas=[metadata or {}], ids=[f"mem_{uuid4()}"] ) def recall(self, query, n_results=5): """Retrieve relevant memories for the query.""" results = self.collection.query( query_texts=[query], n_results=n_results ) return results["documents"][0]
Planning Strategies
Step-by-Step (Chain of Thought)
The agent reasons through each step sequentially, deciding the next action based on the current state. Simple and effective for straightforward tasks.
Plan-then-Execute
The agent first creates an explicit plan listing all steps, then executes them one by one. Allows for reviewing and adjusting the plan before execution.
Tree of Thought
The agent explores multiple possible approaches in parallel, evaluates each path, and selects the most promising one. Better for problems with multiple valid solution paths.
Reflexion
After completing a task (or failing), the agent reflects on what worked and what did not. It uses these reflections to improve future attempts.
Self-Reflection and Self-Correction
Advanced agents can evaluate their own outputs and correct mistakes:
def reflect_and_improve(self, task, initial_result): """Have the agent critique and improve its own work.""" reflection_prompt = f""" Task: {task} Your output: {initial_result} Critically evaluate your output: 1. Does it fully address the task? 2. Are there any errors or inaccuracies? 3. What could be improved? 4. Provide an improved version. """ improved = self.llm.generate(reflection_prompt) return improved
Lilly Tech Systems