Introduction to AI Design Patterns
What are AI design patterns, why they exist, how they are categorized, and a complete overview of every pattern you will learn in this course — your foundation for building production AI systems.
What Are AI Design Patterns?
An AI design pattern is a reusable, proven solution to a commonly occurring problem in AI system design. Just as software engineering has the Gang of Four patterns (Singleton, Observer, Factory) and distributed systems have patterns (Circuit Breaker, Saga, CQRS), the AI engineering discipline has developed its own set of patterns that address the unique challenges of building systems powered by large language models, embedding models, and other AI components.
AI design patterns are not code libraries or frameworks. They are conceptual blueprints that describe:
- The problem: What specific challenge does this pattern address?
- The solution: What is the structural approach to solving it?
- The tradeoffs: What do you gain, and what do you give up?
- When to use it: Under what conditions is this pattern the right choice?
- When NOT to use it: When does this pattern add unnecessary complexity?
Why AI Design Patterns Matter
Without patterns, every AI project reinvents solutions to the same problems. Teams waste months discovering that their RAG pipeline needs reranking, that their agent needs loop detection, or that their multi-model system needs a router. Patterns encode this hard-won knowledge so you can apply it immediately.
Benefits of Using Patterns
- Avoid reinventing the wheel: Thousands of teams have already solved these problems. Patterns capture the best solutions discovered through real-world production experience.
- Proven solutions: Each pattern has been battle-tested in production systems handling millions of requests. They work because they have been refined through failure.
- Team communication: Saying "we use a RAG pattern with a cascade fallback" instantly communicates a complex architecture to any engineer who knows the patterns. It is a shared vocabulary.
- Faster architecture decisions: Instead of debating solutions from scratch, teams can discuss which known patterns best fit their requirements.
- Reduced risk: Patterns come with known tradeoffs. You know what you are getting into before you build.
- Onboarding speed: New team members who know the patterns can understand your system architecture in hours, not weeks.
The Five Pattern Categories
We organize AI design patterns into five categories based on the type of problem they solve:
| Category | What It Addresses | Patterns | Example Problem |
|---|---|---|---|
| Data Patterns | How AI systems access and use external knowledge | RAG, Cache | LLM does not know about your company's internal documents |
| Inference Patterns | How to structure and optimize LLM calls | Prompt Chaining, Cascade, Ensemble | Single LLM call cannot handle complex multi-step reasoning |
| Orchestration Patterns | How multiple AI components work together | Agent/ReAct, Router, Fan-out/Fan-in, Event-Driven | System needs to decide which model to call and coordinate results |
| Safety Patterns | How to make AI systems reliable and trustworthy | Guardrails, Human-in-the-Loop | LLM output must be validated before reaching the user |
| Optimization Patterns | How to reduce cost and latency at scale | Cache, Cascade, Fan-out/Fan-in | AI API costs are growing 10x month-over-month |
All 12 Patterns at a Glance
The following table provides a comprehensive overview of every pattern in this course. Use it as a quick-reference guide throughout your learning journey.
| Pattern | Category | Problem It Solves | When to Use |
|---|---|---|---|
| RAG | Data | LLM lacks domain-specific or up-to-date knowledge | Answering questions from your own documents, data, or knowledge base |
| Agent / ReAct | Orchestration | Tasks require dynamic tool use and multi-step reasoning | Complex tasks where the steps are not known in advance |
| Prompt Chaining | Inference | Single prompt cannot handle complex multi-part tasks | Known multi-step workflows (summarize → extract → format) |
| Router / Gateway | Orchestration | Different requests need different models or processing paths | Multi-model systems, cost optimization, latency-sensitive routing |
| Cascade / Fallback | Inference + Optimization | Using the most powerful model for every request is too expensive | High-volume systems where most requests are simple |
| Ensemble / Voting | Inference | Single model output is not reliable enough for critical decisions | High-stakes outputs (medical, legal, financial) requiring consensus |
| Human-in-the-Loop | Safety | AI output requires human judgment before action is taken | Decisions with real-world consequences (approvals, content publishing) |
| Guardrails / Safety | Safety | LLM output may contain harmful, incorrect, or off-topic content | Any user-facing AI system (always use this pattern) |
| Cache / Optimization | Data + Optimization | Identical or similar requests waste API calls and increase latency | High-volume systems with repeated or similar queries |
| Fan-out / Fan-in | Orchestration + Optimization | Sequential processing of parallel-capable tasks is too slow | Multi-document analysis, parallel tool execution, batch processing |
| Event-Driven AI | Orchestration | AI processing needs to be reactive and decoupled | Real-time pipelines, async processing, microservice AI architectures |
| Pattern Selection Guide | Meta | Choosing which pattern(s) to apply for a given problem | Starting any new AI project or refactoring an existing one |
How Patterns Combine
In production, you almost never use a single pattern in isolation. Real AI systems layer multiple patterns together. Here are the most common pattern combinations:
The Standard Production Stack
User Request
|
v
[Guardrails: Input Validation] ← Safety Pattern
|
v
[Router: Select Processing Path] ← Orchestration Pattern
|
+--> Simple queries --> [Cache Check] --> [Small LLM] ← Optimization
|
+--> Knowledge queries --> [RAG Pipeline] ← Data Pattern
| |
| +--> [Retrieve] --> [Rerank] --> [Generate]
|
+--> Complex tasks --> [Agent with Tools] ← Orchestration
|
v
[Guardrails: Output Validation] ← Safety Pattern
|
v
User Response
Common Pattern Pairings
- RAG + Guardrails: Always validate RAG outputs for hallucination, relevance, and safety before returning to users.
- Agent + Human-in-the-Loop: Agents that take real-world actions (sending emails, updating databases) should require human approval for high-stakes operations.
- Cascade + Cache: Check the cache first (free), then try the small model (cheap), then escalate to the large model (expensive). This combination can reduce costs by 20x.
- Router + Ensemble: Route critical requests to an ensemble of models for consensus, while routing simple requests to a single fast model.
- Fan-out + Prompt Chaining: Fan out to process multiple documents in parallel, then chain the results through summarization and synthesis steps.
- Event-Driven + Fan-out: Events trigger parallel AI processing, results are aggregated asynchronously.
Pattern vs Architecture vs Framework
These three terms are often confused. Here is how they differ in the AI context:
| Concept | What It Is | Scope | Example |
|---|---|---|---|
| Pattern | A reusable conceptual solution to a specific problem | Solves one problem | RAG Pattern, Cascade Pattern, Guardrails Pattern |
| Architecture | The overall structure of a system, composed of multiple patterns | Entire system | "Our system uses RAG with a cascade fallback, guardrails, and event-driven processing" |
| Framework | A code library that implements one or more patterns | Code-level | LangChain, LlamaIndex, Haystack, CrewAI, AutoGen |
The Pattern Selection Decision Tree
Use this decision tree as a starting point when designing a new AI system. Start at the top and follow the branches based on your requirements:
START: What does the system need to do?
|
+-- Does it need external knowledge?
| |-- YES --> RAG Pattern (Lesson 2)
| | +-- Is the data sensitive? --> Add Guardrails (Lesson 9)
| | +-- High query volume? --> Add Cache (Lesson 10)
| +-- NO --> Continue
|
+-- Does it need to take actions / use tools?
| |-- YES --> Agent Pattern (Lesson 3)
| | +-- Actions have consequences? --> Add Human-in-Loop (Lesson 8)
| | +-- Multiple agent types? --> Multi-Agent (Lesson 3)
| +-- NO --> Continue
|
+-- Is the task multi-step with known steps?
| |-- YES --> Prompt Chaining (Lesson 4)
| | +-- Steps can run in parallel? --> Fan-out/Fan-in (Lesson 11)
| +-- NO --> Continue
|
+-- Do different inputs need different models?
| |-- YES --> Router Pattern (Lesson 5)
| | +-- Want cost savings? --> Cascade Pattern (Lesson 6)
| +-- NO --> Continue
|
+-- Is output correctness critical?
| |-- YES --> Ensemble/Voting (Lesson 7)
| +-- NO --> Single model call
|
+-- ALWAYS ADD:
+-- Guardrails (Lesson 9) for any user-facing system
+-- Cache (Lesson 10) for any system with > 100 req/day
+-- Event-Driven (Lesson 12) for async/reactive systems
What You Need Before Starting
This course assumes you have basic familiarity with:
- Python programming (functions, classes, async/await)
- REST APIs and HTTP requests
- Using LLMs through APIs (OpenAI, Anthropic, or similar)
- Basic understanding of what embeddings and vector databases are (we will review these in the RAG lesson)
You do not need to know machine learning math, model training, or any specific framework like LangChain. We will teach patterns framework-agnostically, then show framework implementations where helpful.
Course Structure
Each lesson in this course follows a consistent structure to maximize your learning:
- The Problem: What real-world challenge does this pattern address? With concrete examples of what goes wrong without it.
- The Pattern: The conceptual solution, with architecture diagrams and flow descriptions.
- Variations: Different versions of the pattern (naive, advanced, modular) and when each applies.
- Code Examples: Working Python code implementing the pattern, both from scratch and with popular frameworks.
- Anti-Patterns: Common mistakes and how to avoid them.
- When NOT to Use: Situations where this pattern adds unnecessary complexity.
What's Next
In the next lesson, we dive into the most widely used AI design pattern in production today: Retrieval-Augmented Generation (RAG). You will learn how to ground LLM responses in your own data, implement chunking and retrieval strategies, and build a complete RAG pipeline from scratch.
Lilly Tech Systems