Intermediate

Knowledge Base for AI Assistants

A knowledge base gives your assistant domain-specific expertise. Using RAG (Retrieval-Augmented Generation), your assistant can answer questions from your own documents, policies, and data.

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with LLM generation. Instead of relying solely on the LLM's training data, RAG fetches relevant documents and includes them in the prompt so the LLM can generate answers grounded in your specific data.

  1. Index

    Split your documents into chunks, generate embeddings (numerical vectors), and store them in a vector database.

  2. Retrieve

    When a user asks a question, convert it to an embedding and find the most similar document chunks in the vector database.

  3. Generate

    Pass the retrieved chunks to the LLM along with the user's question. The LLM generates an answer based on the retrieved context.

Python - Simple RAG Implementation
from chromadb import Client
import anthropic

# 1. Index documents
chroma = Client()
collection = chroma.create_collection("knowledge_base")

documents = [
    "Our return policy allows returns within 30 days...",
    "Free shipping on orders over $50...",
    "Premium members get 20% discount...",
]
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))]
)

# 2. Retrieve relevant context
def get_context(query, n_results=3):
    results = collection.query(
        query_texts=[query],
        n_results=n_results
    )
    return "\n\n".join(results["documents"][0])

# 3. Generate answer with context
def ask(question):
    context = get_context(question)
    client = anthropic.Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system="""Answer based on the provided context.
If the context doesn't contain the answer,
say you don't have that information.""",
        messages=[{
            "role": "user",
            "content": f"""Context:\n{context}\n\nQuestion: {question}"""
        }]
    )
    return response.content[0].text

Document Processing

Getting documents into your knowledge base requires processing:

Supported Formats

  • PDFs: Use PyPDF2, pdfplumber, or unstructured.io for extraction
  • Word docs: python-docx for .docx files
  • HTML/Websites: Beautiful Soup or Trafilatura for web scraping and content extraction
  • Markdown: Direct text extraction
  • Spreadsheets: Convert rows to natural language descriptions

Chunking Strategies

  • Fixed size: Split every N characters/tokens. Simple but may split mid-sentence.
  • Paragraph-based: Split on paragraph boundaries. Preserves semantic units.
  • Recursive: Try to split on paragraphs, then sentences, then words. LangChain's default.
  • Semantic: Use embeddings to find natural topic boundaries. Most accurate but slowest.
  • Chunk size: 500-1000 tokens per chunk is a good starting point. Include overlap (100-200 tokens) between chunks.

Vector Databases

DatabaseTypeBest ForFree Tier
ChromaDBEmbedded / Client-ServerPrototyping, small-medium scaleOpen source
PineconeManaged cloudProduction, serverlessYes
WeaviateSelf-hosted / CloudHybrid search (vector + keyword)Open source
QdrantSelf-hosted / CloudHigh performance, filteringOpen source
pgvectorPostgreSQL extensionAlready using PostgreSQLOpen source

Source Citation

Good assistants cite their sources so users can verify information:

  • Include document name, section, or page number with each retrieved chunk
  • Instruct the LLM to reference sources in its answers: "According to the Return Policy document..."
  • Provide links to original documents when available
  • Store metadata (source, date, author) alongside each chunk for rich citations

Handling "I Don't Know"

Critical design principle: An assistant that says "I don't have that information" when it genuinely does not know is far better than one that hallucinates an answer. Instruct your LLM explicitly: "If the provided context does not contain the answer, say so. Do not make up information."

Keeping Knowledge Current

  • Scheduled re-indexing: Re-process documents on a regular schedule (daily, weekly)
  • Incremental updates: Add new documents without re-indexing everything
  • Version tracking: Track which version of each document is indexed
  • Stale content detection: Flag documents that have not been updated past a threshold
  • Feedback loop: When users report incorrect answers, update the knowledge base

Custom Knowledge vs. General Knowledge

  • Custom knowledge (RAG): Your specific documents, policies, product info. Always up-to-date. Cite-able.
  • General knowledge (LLM): The model's training data. Broad but may be outdated. Cannot cite.
  • Best approach: Use RAG for domain-specific questions. Allow the LLM to use general knowledge for common-sense reasoning and general topics. Make it clear in the system prompt which takes priority.