Intermediate

Memory & Planning

Memory gives agents continuity across actions. Planning gives them strategy. Together, they transform a simple tool-calling loop into an intelligent system that can handle complex, multi-step tasks.

Memory Types

Short-Term Memory (Conversation Buffer)

The simplest form of memory: keep the full conversation history in the LLM's context window. Every message, tool call, and result stays in context.

  • Implementation: Append each message to a list; send the full list with each LLM call
  • Advantage: Complete context, no information loss
  • Limitation: Context window eventually fills up for long tasks
  • Best for: Short to medium tasks within context limits

Working Memory (Task Scratchpad)

A structured space where the agent tracks the current task state: what has been done, what remains, intermediate results, and current hypotheses.

Python - Working Memory
class WorkingMemory:
    def __init__(self):
        self.plan = []           # Current plan steps
        self.completed = []      # Completed steps
        self.findings = {}       # Key findings by topic
        self.current_step = 0   # Where we are in plan
        self.errors = []         # Errors encountered

    def to_prompt(self):
        """Format working memory for inclusion in prompt."""
        return f"""Current Task State:
- Plan: {self.plan}
- Completed: {self.completed}
- Step: {self.current_step}/{len(self.plan)}
- Key Findings: {self.findings}
- Errors: {self.errors}"""

Long-Term Memory (Persistent Storage)

Information that persists across sessions. The agent can recall past experiences, learned facts, and user preferences.

  • Vector store: Store embeddings of past conversations and results. Retrieve relevant memories using semantic search.
  • Entity memory: Track information about specific entities (users, projects, documents) in a structured database.
  • Summary memory: Periodically summarize conversation history and store summaries for future reference.

Memory Implementations

Buffer Memory

Keep the last N messages or last N tokens. Simple but loses old context.

Summary Memory

Use the LLM to periodically summarize the conversation so far, replacing old messages with a compact summary. Preserves key information while staying within context limits.

Vector Store Memory

Python - Vector Store Memory
from chromadb import Client

class VectorMemory:
    def __init__(self):
        self.client = Client()
        self.collection = self.client.create_collection("agent_memory")

    def store(self, text, metadata=None):
        """Store a memory with auto-generated embedding."""
        self.collection.add(
            documents=[text],
            metadatas=[metadata or {}],
            ids=[f"mem_{uuid4()}"]
        )

    def recall(self, query, n_results=5):
        """Retrieve relevant memories for the query."""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        return results["documents"][0]

Planning Strategies

Step-by-Step (Chain of Thought)

The agent reasons through each step sequentially, deciding the next action based on the current state. Simple and effective for straightforward tasks.

Plan-then-Execute

The agent first creates an explicit plan listing all steps, then executes them one by one. Allows for reviewing and adjusting the plan before execution.

Tree of Thought

The agent explores multiple possible approaches in parallel, evaluates each path, and selects the most promising one. Better for problems with multiple valid solution paths.

Reflexion

After completing a task (or failing), the agent reflects on what worked and what did not. It uses these reflections to improve future attempts.

Self-Reflection and Self-Correction

Advanced agents can evaluate their own outputs and correct mistakes:

Python - Self-Reflection Pattern
def reflect_and_improve(self, task, initial_result):
    """Have the agent critique and improve its own work."""
    reflection_prompt = f"""
    Task: {task}
    Your output: {initial_result}

    Critically evaluate your output:
    1. Does it fully address the task?
    2. Are there any errors or inaccuracies?
    3. What could be improved?
    4. Provide an improved version.
    """
    improved = self.llm.generate(reflection_prompt)
    return improved
Memory design principle: Start with the simplest memory that works (conversation buffer). Only add complexity (vector stores, summaries) when you actually hit context limits or need cross-session persistence. Premature complexity in memory systems is a common mistake.