Advanced

Best Practices

Production-ready guidelines for prompt engineering, safety, cost optimization, and avoiding common mistakes when working with Claude.

Prompt Engineering Best Practices

These practices will consistently improve the quality of Claude's responses across all use cases.

1. Be Explicit About Output Format

Never assume Claude knows what format you want. Specify it clearly.

Example

// Good - explicit format
Analyze this text for sentiment. Return your
analysis as JSON with these exact fields:
{
  "sentiment": "positive" | "negative" | "neutral",
  "confidence": 0.0 to 1.0,
  "key_phrases": ["phrase1", "phrase2"],
  "reasoning": "one sentence explanation"
}

2. Use System Prompts for Consistency

When building applications, define Claude's behavior in the system prompt so every response follows the same pattern.

System Prompt Pattern

system = """You are a customer support assistant for Acme Corp.

Rules:
- Always be polite and professional
- If you don't know an answer, say so and offer
  to escalate to a human agent
- Never share internal company information
- Respond in the customer's language
- Keep responses under 150 words unless the
  customer asks for more detail
- Always end with: "Is there anything else I
  can help with?"
"""

3. Test with Diverse Inputs

Your prompt should handle edge cases gracefully. Test with:

Empty or minimal inputs
Very long inputs near the context limit
Inputs in different languages
Adversarial or confusing inputs
Inputs with special characters or formatting

4. Version Control Your Prompts

Treat prompts like code. Store them in version control, test changes systematically, and track which prompt version produces which results.

Python - Prompt Management

# Store prompts as named constants
CLASSIFY_PROMPT_V2 = """Classify the following support
ticket into one of these categories:
- billing
- technical
- account
- feature_request
- other

Return only the category name, nothing else.

Ticket: {ticket_text}"""

# Use with clear versioning
def classify_ticket(ticket_text: str) -> str:
    prompt = CLASSIFY_PROMPT_V2.format(
        ticket_text=ticket_text
    )
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=20,
        temperature=0.0,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text.strip()

Safety and Responsible Use

Using AI responsibly is not just ethical — it protects your users, your brand, and your business.

Content Filtering

When building user-facing applications, add your own content filtering on top of Claude's built-in safety:

Validate and sanitize user inputs before sending to Claude
Review Claude's outputs before showing them to end users in sensitive contexts
Implement feedback mechanisms so users can report inappropriate responses

Data Privacy

Do not send personally identifiable information (PII) unless necessary and compliant with your privacy policy
Anonymize or pseudonymize data before sending to the API when possible
Understand and communicate Anthropic's data retention policies to your users
Consider data residency requirements for your jurisdiction

Transparency

Disclose to users when they are interacting with AI
Do not present Claude's outputs as human-written when accuracy and attribution matter
Always verify critical information — Claude can make mistakes

⚠

Important: Never use Claude for autonomous decision-making in high-stakes scenarios (medical diagnosis, legal advice, financial trading) without human oversight. Claude is a tool to assist humans, not replace human judgment in critical situations.

Cost Optimization

API costs scale with token usage. Here are proven strategies to keep costs down without sacrificing quality.

Choose the Right Model

Task Type	Recommended Model	Why
Simple classification	Haiku	60x cheaper than Opus, fast enough for real-time
Code generation	Sonnet	Best quality-to-cost ratio for coding tasks
Complex analysis	Opus (or Sonnet)	Only use Opus when Sonnet quality is insufficient
Routing / triage	Haiku	Use Haiku to classify, then route to bigger models

Reduce Token Usage

Set appropriate max_tokens: Do not default to 4096 if you only need 100 tokens back
Trim context: Only include relevant information in the prompt, not entire documents if only a section matters
Cache common responses: If you ask the same question repeatedly, cache the result
Use prompt compression: Remove unnecessary whitespace, filler words, and redundant instructions

Python - Token-Efficient Pattern

import hashlib, json

# Simple response cache
cache = {}

def call_claude_cached(prompt, model, max_tokens):
    cache_key = hashlib.md5(
        f"{model}:{prompt}".encode()
    ).hexdigest()

    if cache_key in cache:
        return cache[cache_key]

    response = client.messages.create(
        model=model,
        max_tokens=max_tokens,
        messages=[{"role": "user", "content": prompt}]
    )
    result = response.content[0].text
    cache[cache_key] = result
    return result

Rate Limits and Quotas

The API has rate limits that vary by plan tier. Key limits to be aware of:

Requests per minute (RPM): Number of API calls you can make per minute
Tokens per minute (TPM): Total input + output tokens per minute
Tokens per day (TPD): Daily token budget

✅

Handling rate limits: Implement exponential backoff with jitter. The SDK handles this automatically, but if using raw HTTP requests, check the retry-after header in 429 responses.

Testing and Iteration

Treat prompt development like software development with a testing cycle.

Define Success Criteria

Before writing a prompt, define what a good response looks like. Create 5-10 test cases with expected outputs.
Write the Initial Prompt

Start with a clear, simple prompt. Do not over-engineer on the first try.
Test Against Your Criteria

Run all test cases. Score each response. Identify patterns in failures.
Iterate and Refine

Adjust the prompt based on failures. Add examples, clarify instructions, or add constraints.
Regression Test

After changes, re-run all test cases to ensure improvements did not break previously working cases.

Production Deployment Tips

Use Pinned Model Versions

Always use the full model name with date in production to avoid unexpected behavior changes.

Model Versioning

# Development - ok to use latest
model = "claude-sonnet-4-latest"

# Production - pin the version
model = "claude-sonnet-4-20250514"

Implement Timeouts

Set appropriate timeouts to prevent your application from hanging on slow responses.

Monitor Usage

Track token usage, response times, and error rates. Set up alerts for unusual spikes.

Graceful Degradation

Plan for API outages. Have fallback responses or alternative workflows ready.

Common Pitfalls to Avoid

⚠

Pitfall 1: Prompt Injection. If your application passes user input into prompts, users can inject instructions. Always validate and sanitize user input, and use XML tags to clearly separate user content from system instructions.

⚠

Pitfall 2: Trusting Output Blindly. Claude can generate incorrect information confidently. For critical applications, always verify outputs against trusted sources. Never use Claude output directly in safety-critical systems.

⚠

Pitfall 3: Not Setting max_tokens. Without a reasonable max_tokens value, responses can be unexpectedly long and expensive. Set this parameter based on what you actually need.

⚠

Pitfall 4: Ignoring Token Costs in Loops. If you call Claude in a loop (processing 1000 items), calculate the total cost before running. A prompt that costs $0.01 per call costs $10 for 1000 calls.

⚠

Pitfall 5: One-Size-Fits-All Prompts. Using the same prompt for vastly different inputs leads to inconsistent results. Create specialized prompts for different input types and use routing logic to select the right one.

Frequently Asked Questions

Claude pricing is based on token usage. Haiku starts at $0.25 per million input tokens, Sonnet at $3, and Opus at $15. Output tokens are more expensive. For casual use, claude.ai offers free and paid subscription plans. Check Anthropic's pricing page for current rates.

By default, Claude cannot browse the internet. Its knowledge comes from training data with a knowledge cutoff date. However, you can provide Claude with current information by pasting text, uploading documents, or using tool use / function calling to give Claude access to external data sources.

Claude's training data has a knowledge cutoff date that varies by model version. For the latest models, this is typically within the past few months. Claude will tell you when a question falls outside its knowledge window. Always provide current data directly when you need analysis of recent events.

Anthropic's API data policy states that they do not train on API customer data by default. Data sent through the API is handled according to their data retention policy and terms of service. For the consumer product (claude.ai), policies may differ. Review Anthropic's current privacy policy and terms for the most up-to-date information.

When a conversation approaches the context limit, you have several options: (1) Summarize older messages and replace them with the summary, (2) Start a new conversation with relevant context from the old one, (3) Use a sliding window approach that keeps only the most recent N messages plus the system prompt, or (4) Implement retrieval-augmented generation (RAG) to pull in relevant context on demand.

Claude can analyze and understand images (vision), but it cannot generate or create new images. For image generation, you would need to use a separate tool or service. Claude can, however, generate text descriptions, SVG code, or ASCII art.

Claude can write and analyze code in virtually all popular programming languages, including Python, JavaScript, TypeScript, Java, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, SQL, HTML/CSS, and many more. It handles framework-specific code (React, Django, Spring Boot, etc.) and can work with configuration files, build scripts, and infrastructure-as-code tools.

Course Complete!

Congratulations on completing the Claude AI course! You now have a solid foundation in:

What Claude is and how it works
Choosing between Haiku, Sonnet, and Opus
Writing effective prompts from basic to advanced
Applying Claude to real-world use cases
Integrating Claude via the API
Production best practices and safety considerations

🏆

Next steps: Explore the Claude Code CLI course to learn how to use AI-powered coding assistance directly in your terminal, or check out the Claude CoWork course for team collaboration.

← Previous API Basics Next Course → Claude Code CLI

Best Practices

Prompt Engineering Best Practices

1. Be Explicit About Output Format

2. Use System Prompts for Consistency

3. Test with Diverse Inputs

4. Version Control Your Prompts

Safety and Responsible Use

Content Filtering

Data Privacy

Transparency

Cost Optimization

Choose the Right Model

Reduce Token Usage

Rate Limits and Quotas

Testing and Iteration

Define Success Criteria

Write the Initial Prompt

Test Against Your Criteria

Iterate and Refine

Regression Test

Production Deployment Tips

Use Pinned Model Versions

Implement Timeouts

Monitor Usage

Graceful Degradation

Common Pitfalls to Avoid

Frequently Asked Questions

Course Complete!