Knowledge Base for AI Assistants
A knowledge base gives your assistant domain-specific expertise. Using RAG (Retrieval-Augmented Generation), your assistant can answer questions from your own documents, policies, and data.
What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with LLM generation. Instead of relying solely on the LLM's training data, RAG fetches relevant documents and includes them in the prompt so the LLM can generate answers grounded in your specific data.
Index
Split your documents into chunks, generate embeddings (numerical vectors), and store them in a vector database.
Retrieve
When a user asks a question, convert it to an embedding and find the most similar document chunks in the vector database.
Generate
Pass the retrieved chunks to the LLM along with the user's question. The LLM generates an answer based on the retrieved context.
from chromadb import Client import anthropic # 1. Index documents chroma = Client() collection = chroma.create_collection("knowledge_base") documents = [ "Our return policy allows returns within 30 days...", "Free shipping on orders over $50...", "Premium members get 20% discount...", ] collection.add( documents=documents, ids=[f"doc_{i}" for i in range(len(documents))] ) # 2. Retrieve relevant context def get_context(query, n_results=3): results = collection.query( query_texts=[query], n_results=n_results ) return "\n\n".join(results["documents"][0]) # 3. Generate answer with context def ask(question): context = get_context(question) client = anthropic.Anthropic() response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system="""Answer based on the provided context. If the context doesn't contain the answer, say you don't have that information.""", messages=[{ "role": "user", "content": f"""Context:\n{context}\n\nQuestion: {question}""" }] ) return response.content[0].text
Document Processing
Getting documents into your knowledge base requires processing:
Supported Formats
- PDFs: Use PyPDF2, pdfplumber, or unstructured.io for extraction
- Word docs: python-docx for .docx files
- HTML/Websites: Beautiful Soup or Trafilatura for web scraping and content extraction
- Markdown: Direct text extraction
- Spreadsheets: Convert rows to natural language descriptions
Chunking Strategies
- Fixed size: Split every N characters/tokens. Simple but may split mid-sentence.
- Paragraph-based: Split on paragraph boundaries. Preserves semantic units.
- Recursive: Try to split on paragraphs, then sentences, then words. LangChain's default.
- Semantic: Use embeddings to find natural topic boundaries. Most accurate but slowest.
- Chunk size: 500-1000 tokens per chunk is a good starting point. Include overlap (100-200 tokens) between chunks.
Vector Databases
| Database | Type | Best For | Free Tier |
|---|---|---|---|
| ChromaDB | Embedded / Client-Server | Prototyping, small-medium scale | Open source |
| Pinecone | Managed cloud | Production, serverless | Yes |
| Weaviate | Self-hosted / Cloud | Hybrid search (vector + keyword) | Open source |
| Qdrant | Self-hosted / Cloud | High performance, filtering | Open source |
| pgvector | PostgreSQL extension | Already using PostgreSQL | Open source |
Source Citation
Good assistants cite their sources so users can verify information:
- Include document name, section, or page number with each retrieved chunk
- Instruct the LLM to reference sources in its answers: "According to the Return Policy document..."
- Provide links to original documents when available
- Store metadata (source, date, author) alongside each chunk for rich citations
Handling "I Don't Know"
Keeping Knowledge Current
- Scheduled re-indexing: Re-process documents on a regular schedule (daily, weekly)
- Incremental updates: Add new documents without re-indexing everything
- Version tracking: Track which version of each document is indexed
- Stale content detection: Flag documents that have not been updated past a threshold
- Feedback loop: When users report incorrect answers, update the knowledge base
Custom Knowledge vs. General Knowledge
- Custom knowledge (RAG): Your specific documents, policies, product info. Always up-to-date. Cite-able.
- General knowledge (LLM): The model's training data. Broad but may be outdated. Cannot cite.
- Best approach: Use RAG for domain-specific questions. Allow the LLM to use general knowledge for common-sense reasoning and general topics. Make it clear in the system prompt which takes priority.
Lilly Tech Systems