XML Output Intermediate

XML is an often-overlooked format for structured AI output. While JSON dominates API usage, XML excels at mixed content (text interleaved with data), streaming extraction, and scenarios where Claude in particular produces more reliable results.

When to Use XML over JSON

Scenario	JSON	XML	Recommendation
Pure data extraction	Great	Good	Use JSON
Mixed text + data	Awkward	Natural	Use XML
Multiple output sections	Possible	Excellent	Use XML
Streaming partial parsing	Hard	Easy (tag-based)	Use XML
Claude-specific prompting	Good	Excellent	Use XML

Prompting for XML Output

Python

import anthropic
import re

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": """Analyze this review and provide your analysis in XML format:

"Great product but shipping was slow. The quality exceeded my expectations."

Use this format:
<analysis>
  <sentiment>positive/negative/neutral</sentiment>
  <score>1-10</score>
  <summary>Brief summary</summary>
  <pros>
    <item>Pro point</item>
  </pros>
  <cons>
    <item>Con point</item>
  </cons>
</analysis>"""
    }]
)

text = response.content[0].text

Parsing XML Output

Python

import xml.etree.ElementTree as ET
import re

def extract_xml(text, tag):
    """Extract content of an XML tag from model output."""
    pattern = f"<{tag}>(.*?)</{tag}>"
    match = re.search(pattern, text, re.DOTALL)
    return match.group(1).strip() if match else None

def extract_xml_list(text, wrapper_tag, item_tag):
    """Extract a list of items from XML."""
    wrapper = extract_xml(text, wrapper_tag)
    if not wrapper:
        return []
    return re.findall(f"<{item_tag}>(.*?)</{item_tag}>", wrapper, re.DOTALL)

# Parse the response
sentiment = extract_xml(text, "sentiment")
score = int(extract_xml(text, "score"))
summary = extract_xml(text, "summary")
pros = extract_xml_list(text, "pros", "item")
cons = extract_xml_list(text, "cons", "item")

XML for Multi-Section Responses

XML shines when you need the model to produce multiple distinct sections:

Prompt Template

Provide your response in this format:

<thinking>
Your step-by-step reasoning (not shown to user)
</thinking>

<answer>
The concise answer for the user
</answer>

<confidence>
high/medium/low
</confidence>

<sources>
  <source>Reference 1</source>
  <source>Reference 2</source>
</sources>

Claude + XML: Claude models are particularly good at following XML formatting instructions. Anthropic's own documentation uses XML tags for structuring prompts, so the model has strong training signal for XML patterns.

Caveat: Unlike JSON mode, there is no provider-level guarantee that the output will be valid XML. Always wrap your XML parsing in try/except blocks and have fallback parsing strategies.

← Pydantic Output Validation →