Best Practices
Production-ready guidelines for prompt engineering, safety, cost optimization, and avoiding common mistakes when working with Claude.
Prompt Engineering Best Practices
These practices will consistently improve the quality of Claude's responses across all use cases.
1. Be Explicit About Output Format
Never assume Claude knows what format you want. Specify it clearly.
// Good - explicit format Analyze this text for sentiment. Return your analysis as JSON with these exact fields: { "sentiment": "positive" | "negative" | "neutral", "confidence": 0.0 to 1.0, "key_phrases": ["phrase1", "phrase2"], "reasoning": "one sentence explanation" }
2. Use System Prompts for Consistency
When building applications, define Claude's behavior in the system prompt so every response follows the same pattern.
system = """You are a customer support assistant for Acme Corp. Rules: - Always be polite and professional - If you don't know an answer, say so and offer to escalate to a human agent - Never share internal company information - Respond in the customer's language - Keep responses under 150 words unless the customer asks for more detail - Always end with: "Is there anything else I can help with?" """
3. Test with Diverse Inputs
Your prompt should handle edge cases gracefully. Test with:
- Empty or minimal inputs
- Very long inputs near the context limit
- Inputs in different languages
- Adversarial or confusing inputs
- Inputs with special characters or formatting
4. Version Control Your Prompts
Treat prompts like code. Store them in version control, test changes systematically, and track which prompt version produces which results.
# Store prompts as named constants CLASSIFY_PROMPT_V2 = """Classify the following support ticket into one of these categories: - billing - technical - account - feature_request - other Return only the category name, nothing else. Ticket: {ticket_text}""" # Use with clear versioning def classify_ticket(ticket_text: str) -> str: prompt = CLASSIFY_PROMPT_V2.format( ticket_text=ticket_text ) response = client.messages.create( model="claude-haiku-3-5-20241022", max_tokens=20, temperature=0.0, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text.strip()
Safety and Responsible Use
Using AI responsibly is not just ethical — it protects your users, your brand, and your business.
Content Filtering
When building user-facing applications, add your own content filtering on top of Claude's built-in safety:
- Validate and sanitize user inputs before sending to Claude
- Review Claude's outputs before showing them to end users in sensitive contexts
- Implement feedback mechanisms so users can report inappropriate responses
Data Privacy
- Do not send personally identifiable information (PII) unless necessary and compliant with your privacy policy
- Anonymize or pseudonymize data before sending to the API when possible
- Understand and communicate Anthropic's data retention policies to your users
- Consider data residency requirements for your jurisdiction
Transparency
- Disclose to users when they are interacting with AI
- Do not present Claude's outputs as human-written when accuracy and attribution matter
- Always verify critical information — Claude can make mistakes
Cost Optimization
API costs scale with token usage. Here are proven strategies to keep costs down without sacrificing quality.
Choose the Right Model
| Task Type | Recommended Model | Why |
|---|---|---|
| Simple classification | Haiku | 60x cheaper than Opus, fast enough for real-time |
| Code generation | Sonnet | Best quality-to-cost ratio for coding tasks |
| Complex analysis | Opus (or Sonnet) | Only use Opus when Sonnet quality is insufficient |
| Routing / triage | Haiku | Use Haiku to classify, then route to bigger models |
Reduce Token Usage
- Set appropriate max_tokens: Do not default to 4096 if you only need 100 tokens back
- Trim context: Only include relevant information in the prompt, not entire documents if only a section matters
- Cache common responses: If you ask the same question repeatedly, cache the result
- Use prompt compression: Remove unnecessary whitespace, filler words, and redundant instructions
import hashlib, json # Simple response cache cache = {} def call_claude_cached(prompt, model, max_tokens): cache_key = hashlib.md5( f"{model}:{prompt}".encode() ).hexdigest() if cache_key in cache: return cache[cache_key] response = client.messages.create( model=model, max_tokens=max_tokens, messages=[{"role": "user", "content": prompt}] ) result = response.content[0].text cache[cache_key] = result return result
Rate Limits and Quotas
The API has rate limits that vary by plan tier. Key limits to be aware of:
- Requests per minute (RPM): Number of API calls you can make per minute
- Tokens per minute (TPM): Total input + output tokens per minute
- Tokens per day (TPD): Daily token budget
retry-after header in 429 responses.Testing and Iteration
Treat prompt development like software development with a testing cycle.
-
Define Success Criteria
Before writing a prompt, define what a good response looks like. Create 5-10 test cases with expected outputs.
-
Write the Initial Prompt
Start with a clear, simple prompt. Do not over-engineer on the first try.
-
Test Against Your Criteria
Run all test cases. Score each response. Identify patterns in failures.
-
Iterate and Refine
Adjust the prompt based on failures. Add examples, clarify instructions, or add constraints.
-
Regression Test
After changes, re-run all test cases to ensure improvements did not break previously working cases.
Production Deployment Tips
Use Pinned Model Versions
Always use the full model name with date in production to avoid unexpected behavior changes.
# Development - ok to use latest model = "claude-sonnet-4-latest" # Production - pin the version model = "claude-sonnet-4-20250514"
Implement Timeouts
Set appropriate timeouts to prevent your application from hanging on slow responses.
Monitor Usage
Track token usage, response times, and error rates. Set up alerts for unusual spikes.
Graceful Degradation
Plan for API outages. Have fallback responses or alternative workflows ready.
Common Pitfalls to Avoid
Frequently Asked Questions
Claude pricing is based on token usage. Haiku starts at $0.25 per million input tokens, Sonnet at $3, and Opus at $15. Output tokens are more expensive. For casual use, claude.ai offers free and paid subscription plans. Check Anthropic's pricing page for current rates.
By default, Claude cannot browse the internet. Its knowledge comes from training data with a knowledge cutoff date. However, you can provide Claude with current information by pasting text, uploading documents, or using tool use / function calling to give Claude access to external data sources.
Claude's training data has a knowledge cutoff date that varies by model version. For the latest models, this is typically within the past few months. Claude will tell you when a question falls outside its knowledge window. Always provide current data directly when you need analysis of recent events.
Anthropic's API data policy states that they do not train on API customer data by default. Data sent through the API is handled according to their data retention policy and terms of service. For the consumer product (claude.ai), policies may differ. Review Anthropic's current privacy policy and terms for the most up-to-date information.
When a conversation approaches the context limit, you have several options: (1) Summarize older messages and replace them with the summary, (2) Start a new conversation with relevant context from the old one, (3) Use a sliding window approach that keeps only the most recent N messages plus the system prompt, or (4) Implement retrieval-augmented generation (RAG) to pull in relevant context on demand.
Claude can analyze and understand images (vision), but it cannot generate or create new images. For image generation, you would need to use a separate tool or service. Claude can, however, generate text descriptions, SVG code, or ASCII art.
Claude can write and analyze code in virtually all popular programming languages, including Python, JavaScript, TypeScript, Java, C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, SQL, HTML/CSS, and many more. It handles framework-specific code (React, Django, Spring Boot, etc.) and can work with configuration files, build scripts, and infrastructure-as-code tools.
Course Complete!
Congratulations on completing the Claude AI course! You now have a solid foundation in:
- What Claude is and how it works
- Choosing between Haiku, Sonnet, and Opus
- Writing effective prompts from basic to advanced
- Applying Claude to real-world use cases
- Integrating Claude via the API
- Production best practices and safety considerations
Lilly Tech Systems