Intermediate

Knowledge Base for AI Assistants

A knowledge base gives your assistant domain-specific expertise. Using RAG (Retrieval-Augmented Generation), your assistant can answer questions from your own documents, policies, and data.

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with LLM generation. Instead of relying solely on the LLM's training data, RAG fetches relevant documents and includes them in the prompt so the LLM can generate answers grounded in your specific data.

Index
Split your documents into chunks, generate embeddings (numerical vectors), and store them in a vector database.
Retrieve
When a user asks a question, convert it to an embedding and find the most similar document chunks in the vector database.
Generate
Pass the retrieved chunks to the LLM along with the user's question. The LLM generates an answer based on the retrieved context.

Python - Simple RAG Implementation

from chromadb import Client
import anthropic

# 1. Index documents
chroma = Client()
collection = chroma.create_collection("knowledge_base")

documents = [
    "Our return policy allows returns within 30 days...",
    "Free shipping on orders over $50...",
    "Premium members get 20% discount...",
]
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

# 2. Retrieve relevant context
def get_context(query, n_results=3):
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return "\n\n".join(results["documents"][0])

# 3. Generate answer with context
def ask(question):
    context = get_context(question)
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system="""Answer based on the provided context.
If the context doesn't contain the answer,
say you don't have that information.""",
        messages=[{
            "role": "user",
            "content": f"""Context:\n{context}\n\nQuestion: {question}"""
        }]
    )
    return response.content[0].text

Document Processing

Getting documents into your knowledge base requires processing:

Supported Formats

PDFs: Use PyPDF2, pdfplumber, or unstructured.io for extraction
Word docs: python-docx for .docx files
HTML/Websites: Beautiful Soup or Trafilatura for web scraping and content extraction
Markdown: Direct text extraction
Spreadsheets: Convert rows to natural language descriptions

Chunking Strategies

Fixed size: Split every N characters/tokens. Simple but may split mid-sentence.
Paragraph-based: Split on paragraph boundaries. Preserves semantic units.
Recursive: Try to split on paragraphs, then sentences, then words. LangChain's default.
Semantic: Use embeddings to find natural topic boundaries. Most accurate but slowest.
Chunk size: 500-1000 tokens per chunk is a good starting point. Include overlap (100-200 tokens) between chunks.

Vector Databases

Database	Type	Best For	Free Tier
ChromaDB	Embedded / Client-Server	Prototyping, small-medium scale	Open source
Pinecone	Managed cloud	Production, serverless	Yes
Weaviate	Self-hosted / Cloud	Hybrid search (vector + keyword)	Open source
Qdrant	Self-hosted / Cloud	High performance, filtering	Open source
pgvector	PostgreSQL extension	Already using PostgreSQL	Open source

Source Citation

Good assistants cite their sources so users can verify information:

Include document name, section, or page number with each retrieved chunk
Instruct the LLM to reference sources in its answers: "According to the Return Policy document..."
Provide links to original documents when available
Store metadata (source, date, author) alongside each chunk for rich citations

Handling "I Don't Know"

✅

Critical design principle: An assistant that says "I don't have that information" when it genuinely does not know is far better than one that hallucinates an answer. Instruct your LLM explicitly: "If the provided context does not contain the answer, say so. Do not make up information."

Keeping Knowledge Current

Scheduled re-indexing: Re-process documents on a regular schedule (daily, weekly)
Incremental updates: Add new documents without re-indexing everything
Version tracking: Track which version of each document is indexed
Stale content detection: Flag documents that have not been updated past a threshold
Feedback loop: When users report incorrect answers, update the knowledge base

Custom Knowledge vs. General Knowledge

Custom knowledge (RAG): Your specific documents, policies, product info. Always up-to-date. Cite-able.
General knowledge (LLM): The model's training data. Broad but may be outdated. Cannot cite.
Best approach: Use RAG for domain-specific questions. Allow the LLM to use general knowledge for common-sense reasoning and general topics. Make it clear in the system prompt which takes priority.

← Previous Conversation Design Next → Deployment