Intermediate

Prompt Chaining Pattern

Prompt chaining is the design pattern where the output of one LLM call becomes the input to the next, creating a pipeline of specialized steps that together accomplish tasks too complex for a single prompt. It is the backbone of most production AI workflows.

What Is Prompt Chaining?

At its core, prompt chaining is simple: break a complex task into smaller, well-defined steps, and connect them so each step's output feeds into the next. Instead of asking one massive prompt to do everything — analyze a document, extract entities, classify sentiment, generate a summary, and format it as JSON — you create a chain of focused prompts, each handling one responsibility.

This mirrors good software engineering principles: single responsibility, separation of concerns, and composability. Each link in the chain is easier to test, debug, and improve independently. If your summarization step isn't good enough, you fix just that prompt without touching the rest of the pipeline.

Prompt chaining transforms unreliable, monolithic AI calls into robust, predictable pipelines. It is the difference between a prototype that sometimes works and a production system you can trust.

Sequential Chains vs Branching Chains

There are two fundamental topologies for prompt chains, and understanding when to use each is critical:

Sequential Chains

The simplest form: Step A → Step B → Step C. Each step runs in order, passing its output to the next. This is ideal when every step depends on the previous one and the task follows a natural linear flow.

Document analysis: Extract text → Identify sections → Summarize each section → Generate final report
Content generation: Research topic → Create outline → Write draft → Edit and polish
Data processing: Parse raw data → Clean and validate → Transform to target schema → Generate insights

Branching Chains

More complex topologies where the chain splits based on intermediate results. A classification step might route the input to entirely different sub-chains depending on the category. This is powerful for handling diverse inputs with a single pipeline.

Customer support: Classify intent → (if billing: billing chain) | (if technical: tech support chain) | (if general: FAQ chain)
Content moderation: Detect language → (if English: English classifier) | (if Spanish: Spanish classifier) → merge results
Multi-format processing: Detect format → (if PDF: PDF extractor) | (if image: OCR) | (if audio: transcription) → unified analysis

💡

Start sequential, add branches later. Most chains begin as simple sequential pipelines. Only add branching when you have genuinely different processing paths. Over-engineered branching chains are harder to maintain and debug than a straightforward sequence.

Chain Components

Every prompt chain is built from a small set of reusable components. Understanding these building blocks lets you design chains that are robust, testable, and maintainable:

Prompts (The Core Unit)

Each step in the chain has a prompt template with placeholders for dynamic input. Good chain prompts are highly focused — they do exactly one thing and do it well. They include explicit instructions about output format to ensure reliable parsing by the next step.

Parsers

Parsers extract structured data from LLM output. Since LLMs produce free-form text, parsers are the glue that makes chaining reliable. Common parsers include JSON parsers, regex extractors, XML parsers, and structured output parsers that validate against a schema.

Validators

Validators check that a step's output meets quality and format requirements before passing it downstream. A validator might check that extracted JSON is valid, that a summary is within a word count, or that a classification is one of the allowed categories. Without validators, errors cascade through the chain and produce garbage at the end.

Gates (Conditional Logic)

Gates control the flow of the chain based on intermediate results. A gate might check if a confidence score exceeds a threshold, whether the input matches a certain category, or if a quality check passes. Gates enable branching chains and early termination when a step fails or a shortcut is available.

Memory / Context Managers

In longer chains, you need to manage what context is passed forward. Not every step needs the full output of every previous step. Context managers select, compress, or summarize intermediate results to keep prompts focused and within token limits.

LangChain LCEL Chains

LangChain's Expression Language (LCEL) provides a declarative way to compose chains using the pipe operator (|). LCEL chains are lazy, streamable, and support automatic parallelism when steps are independent. They are the most popular framework for building prompt chains in Python.

Python

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_anthropic import ChatAnthropic

model = ChatAnthropic(model="claude-sonnet-4-20250514")

# Step 1: Extract key facts from a document
extract_prompt = ChatPromptTemplate.from_template(
    "Extract the key facts from this document as a JSON list of strings.\n\n"
    "Document: {document}\n\n"
    "Return ONLY valid JSON, no other text."
)

# Step 2: Classify the document based on extracted facts
classify_prompt = ChatPromptTemplate.from_template(
    "Given these key facts, classify the document into one of: "
    "legal, financial, technical, marketing, hr.\n\n"
    "Facts: {facts}\n\n"
    "Return ONLY the category name, nothing else."
)

# Step 3: Generate a summary tailored to the category
summary_prompt = ChatPromptTemplate.from_template(
    "Write a {category}-focused summary of a document with these facts.\n\n"
    "Facts: {facts}\n\n"
    "Write 2-3 paragraphs focusing on what matters most for {category}."
)

# Compose with LCEL pipe operator
extract_chain = extract_prompt | model | StrOutputParser()
classify_chain = classify_prompt | model | StrOutputParser()
summary_chain = summary_prompt | model | StrOutputParser()

# Full pipeline using RunnablePassthrough for data flow
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

full_chain = (
    {"document": RunnablePassthrough()}
    | RunnableLambda(lambda x: {"facts": extract_chain.invoke(x)})
    | RunnableLambda(lambda x: {
        "facts": x["facts"],
        "category": classify_chain.invoke({"facts": x["facts"]})
      })
    | summary_chain
)

result = full_chain.invoke("Your document text here...")
print(result)

Gate and Conditional Chains

Real-world chains rarely follow a straight line. You need conditional logic to handle edge cases, route inputs to specialized handlers, and short-circuit when appropriate. Gates are the mechanism for this control flow.

Common gate patterns include:

Quality gates: Check if the output of a step meets a minimum quality threshold before proceeding. If not, retry with a refined prompt or escalate to a more capable model.
Classification gates: Route the chain to different sub-chains based on a classification result. This is how you build pipelines that handle diverse input types.
Confidence gates: If the model's confidence is below a threshold, take a different path (e.g., ask for human review instead of proceeding automatically).
Length gates: If input text exceeds a token limit, route to a chunking sub-chain; otherwise, process directly.

Python

from langchain_core.runnables import RunnableBranch

# Define specialized chains for each document type
legal_chain = legal_prompt | model | StrOutputParser()
financial_chain = financial_prompt | model | StrOutputParser()
technical_chain = technical_prompt | model | StrOutputParser()
default_chain = general_prompt | model | StrOutputParser()

# Create a conditional branch based on classification
routing_chain = RunnableBranch(
    (lambda x: x["category"] == "legal", legal_chain),
    (lambda x: x["category"] == "financial", financial_chain),
    (lambda x: x["category"] == "technical", technical_chain),
    default_chain  # Fallback for unrecognized categories
)

# Combine with classification step
full_pipeline = (
    classify_step
    | routing_chain
)

Code Example: Document Analysis Chain

Here is a complete, production-style document analysis chain that extracts content, classifies it, generates a summary, and formats everything into a structured report:

Python

import anthropic
import json
from datetime import datetime

client = anthropic.Anthropic()

def document_analysis_chain(document: str) -> dict:
    """Full document analysis: extract -> classify -> summarize -> format."""

    # Step 1: Extract key information
    extract_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": (
                "Extract the following from this document:\n"
                "- title: The document title or subject\n"
                "- entities: List of people, organizations, dates mentioned\n"
                "- key_points: List of main points or arguments\n"
                "- tone: Overall tone (formal, informal, technical, persuasive)\n\n"
                f"Document:\n{document}\n\n"
                "Return ONLY valid JSON with these exact keys."
            )
        }]
    )
    extracted = json.loads(extract_response.content[0].text)
    print(f"[Step 1] Extracted {len(extracted['key_points'])} key points")

    # Step 2: Classify document type and urgency
    classify_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": (
                "Based on this extracted information, classify the document.\n\n"
                f"Extracted data: {json.dumps(extracted)}\n\n"
                "Return JSON with:\n"
                '- "category": one of [legal, financial, technical, marketing, hr, general]\n'
                '- "urgency": one of [low, medium, high, critical]\n'
                '- "confidence": float between 0 and 1'
            )
        }]
    )
    classification = json.loads(classify_response.content[0].text)
    print(f"[Step 2] Classified as {classification['category']} "
          f"(confidence: {classification['confidence']})")

    # Gate: Check confidence before proceeding
    if classification["confidence"] < 0.6:
        classification["category"] = "general"
        classification["flag"] = "low_confidence_classification"

    # Step 3: Generate category-specific summary
    summary_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": (
                f"Write a {classification['category']}-focused summary.\n\n"
                f"Key points: {json.dumps(extracted['key_points'])}\n"
                f"Entities: {json.dumps(extracted['entities'])}\n"
                f"Tone: {extracted['tone']}\n\n"
                "Write 2-3 concise paragraphs highlighting what matters most "
                f"for a {classification['category']} audience."
            )
        }]
    )
    summary = summary_response.content[0].text
    print(f"[Step 3] Generated {len(summary.split())} word summary")

    # Step 4: Format final report
    report = {
        "title": extracted["title"],
        "category": classification["category"],
        "urgency": classification["urgency"],
        "confidence": classification["confidence"],
        "entities": extracted["entities"],
        "key_points": extracted["key_points"],
        "summary": summary,
        "processed_at": datetime.now().isoformat(),
        "chain_version": "1.0.0"
    }

    if "flag" in classification:
        report["flags"] = [classification["flag"]]

    print(f"[Step 4] Report formatted successfully")
    return report

# Usage
result = document_analysis_chain("Your document content here...")
print(json.dumps(result, indent=2))

Code Example: Code Review Chain

This chain analyzes code, finds potential bugs, suggests fixes, and generates a structured review report — mimicking how a senior engineer reviews a pull request:

Python

import anthropic
import json

client = anthropic.Anthropic()

def code_review_chain(code: str, language: str = "python") -> dict:
    """Code review: analyze -> find bugs -> suggest fixes -> report."""

    # Step 1: Analyze code structure and patterns
    analysis = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": (
                f"Analyze this {language} code. Return JSON with:\n"
                '- "functions": list of function names and their purposes\n'
                '- "complexity": "low", "medium", or "high"\n'
                '- "patterns": list of design patterns or idioms used\n'
                '- "dependencies": list of external libraries used\n\n'
                f"```{language}\n{code}\n```\n\n"
                "Return ONLY valid JSON."
            )
        }]
    )
    structure = json.loads(analysis.content[0].text)

    # Step 2: Find bugs and issues
    bugs_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": (
                f"Review this {language} code for bugs and issues.\n\n"
                f"Code structure: {json.dumps(structure)}\n\n"
                f"```{language}\n{code}\n```\n\n"
                "Find: bugs, security issues, performance problems, "
                "error handling gaps, and code smells.\n\n"
                "Return JSON array of issues, each with:\n"
                '- "severity": "critical", "warning", or "info"\n'
                '- "line": approximate line number\n'
                '- "issue": description of the problem\n'
                '- "category": "bug", "security", "performance", "style"'
            )
        }]
    )
    issues = json.loads(bugs_response.content[0].text)

    # Step 3: Generate fixes for critical and warning issues
    fixable = [i for i in issues if i["severity"] in ("critical", "warning")]
    fixes = []

    if fixable:
        fixes_response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=3000,
            messages=[{
                "role": "user",
                "content": (
                    f"Suggest fixes for these {language} code issues:\n\n"
                    f"Issues: {json.dumps(fixable)}\n\n"
                    f"Original code:\n```{language}\n{code}\n```\n\n"
                    "Return JSON array with:\n"
                    '- "issue": the original issue description\n'
                    '- "fix": the corrected code snippet\n'
                    '- "explanation": why this fix works'
                )
            }]
        )
        fixes = json.loads(fixes_response.content[0].text)

    # Step 4: Generate final review report
    report_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": (
                "Write a concise code review summary.\n\n"
                f"Structure: {json.dumps(structure)}\n"
                f"Issues found: {len(issues)} "
                f"({len([i for i in issues if i['severity']=='critical'])} critical)\n"
                f"Fixes suggested: {len(fixes)}\n\n"
                "Write 2-3 paragraphs: overall assessment, key concerns, "
                "and recommended next steps. Be constructive and specific."
            )
        }]
    )

    return {
        "structure": structure,
        "issues": issues,
        "fixes": fixes,
        "summary": report_response.content[0].text,
        "stats": {
            "total_issues": len(issues),
            "critical": len([i for i in issues if i["severity"] == "critical"]),
            "warnings": len([i for i in issues if i["severity"] == "warning"]),
            "info": len([i for i in issues if i["severity"] == "info"]),
            "fixes_suggested": len(fixes)
        }
    }

Error Handling in Chains

Chains fail. LLMs produce malformed output, APIs time out, and intermediate steps produce unexpected results. Robust error handling is what separates production chains from demo code.

Retry with Fallback

When a step fails, retry with a clearer prompt or a more capable model. Always set a maximum retry count to prevent infinite loops.

Python

import time
import json

def robust_chain_step(prompt: str, max_retries: int = 3) -> dict:
    """Execute a chain step with retry logic and JSON validation."""
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=2000,
                messages=[{"role": "user", "content": prompt}]
            )
            result = json.loads(response.content[0].text)
            return result  # Success - valid JSON returned

        except json.JSONDecodeError:
            if attempt < max_retries - 1:
                # Retry with stronger formatting instructions
                prompt += "\n\nIMPORTANT: Return ONLY valid JSON. No markdown, no explanation."
                time.sleep(1 * (attempt + 1))  # Exponential backoff
            else:
                raise ValueError(f"Failed to get valid JSON after {max_retries} attempts")

        except anthropic.RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff
            time.sleep(wait_time)

        except anthropic.APIError as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2)

Graceful Degradation

When a non-critical step fails, provide a default value and continue the chain rather than failing the entire pipeline. Mark the output as degraded so downstream consumers know.

Checkpointing

For long chains processing large datasets, save intermediate results after each step. If the chain fails at step 5 of 8, you can resume from step 5 instead of starting over. This is especially important for chains that process batches of documents or make many API calls.

Chain Debugging and Tracing

When a chain produces bad output, you need to know which step went wrong and why. Effective tracing strategies include:

Step-level logging: Log the input, output, latency, and token usage for every step. This is non-negotiable for production chains.
LangSmith / LangFuse: Dedicated tracing platforms that visualize chain execution, show intermediate outputs, and help identify bottlenecks. LangSmith integrates natively with LangChain; LangFuse is an open-source alternative.
Cost tracking: Each step consumes tokens and costs money. Track per-step costs to identify expensive steps that could use a cheaper model or caching.
Assertion checks: Add programmatic assertions between steps to catch issues early. Assert that output JSON has required keys, that arrays are non-empty, that numbers are within expected ranges.
Golden test sets: Maintain a set of known inputs and expected outputs for your chain. Run these regularly to catch regressions when you modify prompts or change models.

⚠

Tracing is not optional. Without tracing, debugging a 5-step chain is like debugging a program without logs or a debugger. You will waste hours guessing where the problem is. Set up tracing before you deploy your first chain.

Anti-Patterns to Avoid

Prompt chaining is powerful, but it is easy to misuse. Watch out for these common mistakes:

Over-Chaining

Breaking a task into too many steps adds latency, cost, and failure points. If two steps always run together and the intermediate output is never used independently, merge them. A good rule: each step should have a clear, independently testable responsibility.

No Validation Between Steps

Passing raw LLM output directly to the next step without any validation is the number one cause of chain failures. Always validate format (is it valid JSON?), content (are required fields present?), and quality (does the output make sense?) before proceeding.

Accumulating Full Context

Naively passing the complete output of every previous step into every subsequent step wastes tokens and can exceed context limits. Each step should receive only the information it needs. Use parsers and selectors to extract relevant fields.

Ignoring Latency

A 5-step chain where each step takes 3 seconds means 15 seconds of total latency. For user-facing applications, this is often unacceptable. Consider which steps can run in parallel, whether streaming can show partial results, and whether simpler models can handle some steps.

No Fallback Strategy

If step 3 of 5 fails, what happens? Without a fallback strategy, the entire chain fails and the user gets nothing. Design chains with graceful degradation: provide partial results, use cached outputs from a similar previous run, or fall back to a simpler approach.

← Previous Agent & ReAct Next → Router & Gateway