Beginner

Introduction to AI Design Patterns

What are AI design patterns, why they exist, how they are categorized, and a complete overview of every pattern you will learn in this course — your foundation for building production AI systems.

What Are AI Design Patterns?

An AI design pattern is a reusable, proven solution to a commonly occurring problem in AI system design. Just as software engineering has the Gang of Four patterns (Singleton, Observer, Factory) and distributed systems have patterns (Circuit Breaker, Saga, CQRS), the AI engineering discipline has developed its own set of patterns that address the unique challenges of building systems powered by large language models, embedding models, and other AI components.

AI design patterns are not code libraries or frameworks. They are conceptual blueprints that describe:

  • The problem: What specific challenge does this pattern address?
  • The solution: What is the structural approach to solving it?
  • The tradeoffs: What do you gain, and what do you give up?
  • When to use it: Under what conditions is this pattern the right choice?
  • When NOT to use it: When does this pattern add unnecessary complexity?
💡
Key insight: AI systems face challenges that traditional software does not — non-deterministic outputs, hallucinations, variable latency, token costs, context window limits, and model behavior that changes with updates. AI design patterns specifically address these AI-native challenges.

Why AI Design Patterns Matter

Without patterns, every AI project reinvents solutions to the same problems. Teams waste months discovering that their RAG pipeline needs reranking, that their agent needs loop detection, or that their multi-model system needs a router. Patterns encode this hard-won knowledge so you can apply it immediately.

Benefits of Using Patterns

  • Avoid reinventing the wheel: Thousands of teams have already solved these problems. Patterns capture the best solutions discovered through real-world production experience.
  • Proven solutions: Each pattern has been battle-tested in production systems handling millions of requests. They work because they have been refined through failure.
  • Team communication: Saying "we use a RAG pattern with a cascade fallback" instantly communicates a complex architecture to any engineer who knows the patterns. It is a shared vocabulary.
  • Faster architecture decisions: Instead of debating solutions from scratch, teams can discuss which known patterns best fit their requirements.
  • Reduced risk: Patterns come with known tradeoffs. You know what you are getting into before you build.
  • Onboarding speed: New team members who know the patterns can understand your system architecture in hours, not weeks.

The Five Pattern Categories

We organize AI design patterns into five categories based on the type of problem they solve:

CategoryWhat It AddressesPatternsExample Problem
Data Patterns How AI systems access and use external knowledge RAG, Cache LLM does not know about your company's internal documents
Inference Patterns How to structure and optimize LLM calls Prompt Chaining, Cascade, Ensemble Single LLM call cannot handle complex multi-step reasoning
Orchestration Patterns How multiple AI components work together Agent/ReAct, Router, Fan-out/Fan-in, Event-Driven System needs to decide which model to call and coordinate results
Safety Patterns How to make AI systems reliable and trustworthy Guardrails, Human-in-the-Loop LLM output must be validated before reaching the user
Optimization Patterns How to reduce cost and latency at scale Cache, Cascade, Fan-out/Fan-in AI API costs are growing 10x month-over-month
💡
Patterns overlap categories: The Cache pattern is both a Data pattern (it stores retrieved knowledge) and an Optimization pattern (it reduces API calls). The Cascade pattern is both an Inference pattern (it structures model calls) and an Optimization pattern (it reduces costs). Real systems combine patterns from multiple categories.

All 12 Patterns at a Glance

The following table provides a comprehensive overview of every pattern in this course. Use it as a quick-reference guide throughout your learning journey.

PatternCategoryProblem It SolvesWhen to Use
RAG Data LLM lacks domain-specific or up-to-date knowledge Answering questions from your own documents, data, or knowledge base
Agent / ReAct Orchestration Tasks require dynamic tool use and multi-step reasoning Complex tasks where the steps are not known in advance
Prompt Chaining Inference Single prompt cannot handle complex multi-part tasks Known multi-step workflows (summarize → extract → format)
Router / Gateway Orchestration Different requests need different models or processing paths Multi-model systems, cost optimization, latency-sensitive routing
Cascade / Fallback Inference + Optimization Using the most powerful model for every request is too expensive High-volume systems where most requests are simple
Ensemble / Voting Inference Single model output is not reliable enough for critical decisions High-stakes outputs (medical, legal, financial) requiring consensus
Human-in-the-Loop Safety AI output requires human judgment before action is taken Decisions with real-world consequences (approvals, content publishing)
Guardrails / Safety Safety LLM output may contain harmful, incorrect, or off-topic content Any user-facing AI system (always use this pattern)
Cache / Optimization Data + Optimization Identical or similar requests waste API calls and increase latency High-volume systems with repeated or similar queries
Fan-out / Fan-in Orchestration + Optimization Sequential processing of parallel-capable tasks is too slow Multi-document analysis, parallel tool execution, batch processing
Event-Driven AI Orchestration AI processing needs to be reactive and decoupled Real-time pipelines, async processing, microservice AI architectures
Pattern Selection Guide Meta Choosing which pattern(s) to apply for a given problem Starting any new AI project or refactoring an existing one

How Patterns Combine

In production, you almost never use a single pattern in isolation. Real AI systems layer multiple patterns together. Here are the most common pattern combinations:

The Standard Production Stack

User Request
    |
    v
[Guardrails: Input Validation]    ← Safety Pattern
    |
    v
[Router: Select Processing Path]  ← Orchestration Pattern
    |
    +--> Simple queries --> [Cache Check] --> [Small LLM]  ← Optimization
    |
    +--> Knowledge queries --> [RAG Pipeline]  ← Data Pattern
    |         |
    |         +--> [Retrieve] --> [Rerank] --> [Generate]
    |
    +--> Complex tasks --> [Agent with Tools]  ← Orchestration
    |
    v
[Guardrails: Output Validation]   ← Safety Pattern
    |
    v
User Response

Common Pattern Pairings

  • RAG + Guardrails: Always validate RAG outputs for hallucination, relevance, and safety before returning to users.
  • Agent + Human-in-the-Loop: Agents that take real-world actions (sending emails, updating databases) should require human approval for high-stakes operations.
  • Cascade + Cache: Check the cache first (free), then try the small model (cheap), then escalate to the large model (expensive). This combination can reduce costs by 20x.
  • Router + Ensemble: Route critical requests to an ensemble of models for consensus, while routing simple requests to a single fast model.
  • Fan-out + Prompt Chaining: Fan out to process multiple documents in parallel, then chain the results through summarization and synthesis steps.
  • Event-Driven + Fan-out: Events trigger parallel AI processing, results are aggregated asynchronously.

Pattern vs Architecture vs Framework

These three terms are often confused. Here is how they differ in the AI context:

ConceptWhat It IsScopeExample
Pattern A reusable conceptual solution to a specific problem Solves one problem RAG Pattern, Cascade Pattern, Guardrails Pattern
Architecture The overall structure of a system, composed of multiple patterns Entire system "Our system uses RAG with a cascade fallback, guardrails, and event-driven processing"
Framework A code library that implements one or more patterns Code-level LangChain, LlamaIndex, Haystack, CrewAI, AutoGen
Do not confuse the pattern with the framework. You can implement RAG without LangChain. You can build agents without AutoGen. Frameworks are convenient implementations of patterns, but the pattern knowledge is more valuable than framework knowledge because frameworks change rapidly while patterns remain stable.

The Pattern Selection Decision Tree

Use this decision tree as a starting point when designing a new AI system. Start at the top and follow the branches based on your requirements:

START: What does the system need to do?
|
+-- Does it need external knowledge?
|   |-- YES --> RAG Pattern (Lesson 2)
|   |   +-- Is the data sensitive? --> Add Guardrails (Lesson 9)
|   |   +-- High query volume? --> Add Cache (Lesson 10)
|   +-- NO --> Continue
|
+-- Does it need to take actions / use tools?
|   |-- YES --> Agent Pattern (Lesson 3)
|   |   +-- Actions have consequences? --> Add Human-in-Loop (Lesson 8)
|   |   +-- Multiple agent types? --> Multi-Agent (Lesson 3)
|   +-- NO --> Continue
|
+-- Is the task multi-step with known steps?
|   |-- YES --> Prompt Chaining (Lesson 4)
|   |   +-- Steps can run in parallel? --> Fan-out/Fan-in (Lesson 11)
|   +-- NO --> Continue
|
+-- Do different inputs need different models?
|   |-- YES --> Router Pattern (Lesson 5)
|   |   +-- Want cost savings? --> Cascade Pattern (Lesson 6)
|   +-- NO --> Continue
|
+-- Is output correctness critical?
|   |-- YES --> Ensemble/Voting (Lesson 7)
|   +-- NO --> Single model call
|
+-- ALWAYS ADD:
    +-- Guardrails (Lesson 9) for any user-facing system
    +-- Cache (Lesson 10) for any system with > 100 req/day
    +-- Event-Driven (Lesson 12) for async/reactive systems

What You Need Before Starting

This course assumes you have basic familiarity with:

  • Python programming (functions, classes, async/await)
  • REST APIs and HTTP requests
  • Using LLMs through APIs (OpenAI, Anthropic, or similar)
  • Basic understanding of what embeddings and vector databases are (we will review these in the RAG lesson)

You do not need to know machine learning math, model training, or any specific framework like LangChain. We will teach patterns framework-agnostically, then show framework implementations where helpful.

Course Structure

Each lesson in this course follows a consistent structure to maximize your learning:

  1. The Problem: What real-world challenge does this pattern address? With concrete examples of what goes wrong without it.
  2. The Pattern: The conceptual solution, with architecture diagrams and flow descriptions.
  3. Variations: Different versions of the pattern (naive, advanced, modular) and when each applies.
  4. Code Examples: Working Python code implementing the pattern, both from scratch and with popular frameworks.
  5. Anti-Patterns: Common mistakes and how to avoid them.
  6. When NOT to Use: Situations where this pattern adds unnecessary complexity.

What's Next

In the next lesson, we dive into the most widely used AI design pattern in production today: Retrieval-Augmented Generation (RAG). You will learn how to ground LLM responses in your own data, implement chunking and retrieval strategies, and build a complete RAG pipeline from scratch.