Enhancements & Best Practices
You have built a fully functional hybrid search engine. This final lesson covers advanced enhancements you can add: personalization, A/B testing for relevance tuning, query understanding with intent classification, and spell correction. It also includes a comprehensive FAQ covering common questions from search engine builders.
1. Search Personalization
Personalization re-ranks results based on user behavior — their search history, clicked documents, and category preferences. A simple approach uses a user profile vector:
# app/search/personalization.py
"""User-based search personalization."""
from app.cache import get_redis
import json
def get_user_profile(user_id: str) -> dict:
"""Load a user's search profile from Redis.
The profile tracks category preferences and recent clicks.
"""
try:
r = get_redis()
profile = r.get(f"user_profile:{user_id}")
if profile:
return json.loads(profile)
except Exception:
pass
return {
"category_weights": {},
"recent_clicks": [],
"search_count": 0
}
def update_user_profile(user_id: str, category: str, document_id: str):
"""Update the user profile after a click.
Increments category weight and adds to recent clicks.
"""
try:
r = get_redis()
profile = get_user_profile(user_id)
# Increment category preference
if category:
current = profile["category_weights"].get(category, 0)
profile["category_weights"][category] = current + 1
# Add to recent clicks (keep last 50)
profile["recent_clicks"].insert(0, document_id)
profile["recent_clicks"] = profile["recent_clicks"][:50]
profile["search_count"] += 1
r.setex(f"user_profile:{user_id}", 86400 * 30, json.dumps(profile))
except Exception:
pass
def personalize_results(results: list[dict], user_id: str) -> list[dict]:
"""Re-score results based on user preferences.
Documents in preferred categories get a boost.
Previously clicked documents get a smaller boost.
"""
profile = get_user_profile(user_id)
if not profile["category_weights"]:
return results # No personalization data yet
max_weight = max(profile["category_weights"].values())
for result in results:
category = result["source"].get("category", "")
category_weight = profile["category_weights"].get(category, 0)
# Normalize to 0-0.1 range and add as boost
if max_weight > 0:
boost = 0.1 * (category_weight / max_weight)
result["score"] = result["score"] + boost
result["personalized"] = True
# Re-sort by updated scores
results.sort(key=lambda x: x["score"], reverse=True)
return results
2. A/B Testing for Relevance Tuning
A/B testing lets you compare different search configurations with real users. Route a percentage of traffic to a variant and measure click-through rates:
# app/search/ab_testing.py
"""Simple A/B testing for search relevance tuning."""
import hashlib
import random
from app.cache import get_redis
# Define experiments
EXPERIMENTS = {
"rrf_constant": {
"control": {"rrf_k": 60},
"variant_a": {"rrf_k": 30}, # Less dampening of top ranks
"variant_b": {"rrf_k": 100}, # More dampening
},
"rerank_candidates": {
"control": {"reranker_top_k": 20},
"variant_a": {"reranker_top_k": 10}, # Faster, less accurate
"variant_b": {"reranker_top_k": 50}, # Slower, more accurate
}
}
def get_variant(user_id: str, experiment: str) -> str:
"""Deterministically assign a user to a variant.
Uses consistent hashing so the same user always sees the same variant.
"""
hash_input = f"{user_id}:{experiment}"
hash_value = int(hashlib.md5(hash_input.encode()).hexdigest(), 16)
bucket = hash_value % 100
if bucket < 33:
return "control"
elif bucket < 66:
return "variant_a"
else:
return "variant_b"
def get_experiment_config(user_id: str, experiment: str) -> dict:
"""Get the configuration for a user's assigned variant."""
if experiment not in EXPERIMENTS:
return {}
variant = get_variant(user_id, experiment)
return EXPERIMENTS[experiment].get(variant, {})
def log_experiment_result(
user_id: str, experiment: str, query: str,
clicked: bool, position: int = None
):
"""Log an A/B test result for later analysis."""
try:
r = get_redis()
variant = get_variant(user_id, experiment)
# Track click-through rate per variant
key = f"ab:{experiment}:{variant}"
r.hincrby(key, "impressions", 1)
if clicked:
r.hincrby(key, "clicks", 1)
if position:
r.hincrby(key, f"clicks_pos_{position}", 1)
except Exception:
pass
3. Query Understanding
Query understanding analyzes the user's intent before searching. This enables automatic mode selection, query expansion, and intent-specific ranking:
# app/search/query_understanding.py
"""Query analysis and intent classification."""
import re
from dataclasses import dataclass
@dataclass
class QueryAnalysis:
"""Result of analyzing a search query."""
original: str
cleaned: str
intent: str # "navigational", "informational", "exact_match"
suggested_mode: str # "keyword", "semantic", "hybrid"
has_quotes: bool
is_question: bool
tokens: list[str]
expansions: list[str]
def analyze_query(query: str) -> QueryAnalysis:
"""Analyze a query to determine intent and optimal search strategy.
Rules:
1. Quoted phrases -> keyword search (exact match intent)
2. Questions (who, what, how...) -> semantic search (informational)
3. Short specific terms -> keyword search (navigational)
4. Long natural language -> semantic search (informational)
5. Everything else -> hybrid
"""
cleaned = query.strip()
tokens = cleaned.lower().split()
has_quotes = '"' in query or "'" in query
is_question = bool(re.match(
r'^(what|how|why|when|where|who|which|can|does|is|are|do)\b',
cleaned.lower()
))
# Determine intent and mode
if has_quotes:
intent = "exact_match"
suggested_mode = "keyword"
elif is_question:
intent = "informational"
suggested_mode = "semantic"
elif len(tokens) <= 2 and all(len(t) > 2 for t in tokens):
intent = "navigational"
suggested_mode = "keyword"
elif len(tokens) >= 5:
intent = "informational"
suggested_mode = "semantic"
else:
intent = "exploratory"
suggested_mode = "hybrid"
# Simple query expansion using common synonyms
expansions = expand_query(tokens)
return QueryAnalysis(
original=query,
cleaned=cleaned,
intent=intent,
suggested_mode=suggested_mode,
has_quotes=has_quotes,
is_question=is_question,
tokens=tokens,
expansions=expansions
)
# Simple synonym dictionary for query expansion
SYNONYMS = {
"ml": ["machine learning"],
"ai": ["artificial intelligence"],
"nlp": ["natural language processing"],
"db": ["database"],
"api": ["application programming interface", "REST API"],
"js": ["javascript"],
"py": ["python"],
"k8s": ["kubernetes"],
}
def expand_query(tokens: list[str]) -> list[str]:
"""Expand query tokens with synonyms."""
expansions = []
for token in tokens:
if token.lower() in SYNONYMS:
expansions.extend(SYNONYMS[token.lower()])
return expansions
4. Spell Correction
Elasticsearch has built-in spell correction using the suggest API. Add a "Did you mean?" feature:
# app/search/spell_check.py
"""Spell correction using Elasticsearch suggestions."""
from app.elasticsearch.client import SearchClient
def get_spelling_suggestions(query: str) -> list[str]:
"""Get spelling suggestions from Elasticsearch.
Uses the phrase suggester for context-aware corrections.
"machne lerning" -> "machine learning"
"""
client = SearchClient()
suggest_body = {
"suggest": {
"text": query,
"title_suggestion": {
"phrase": {
"field": "title",
"size": 3,
"gram_size": 3,
"direct_generator": [{
"field": "title",
"suggest_mode": "popular"
}],
"collate": {
"query": {
"source": {
"match": {
"title": "{{suggestion}}"
}
}
},
"prune": True
}
}
},
"body_suggestion": {
"phrase": {
"field": "body",
"size": 3,
"gram_size": 3,
"direct_generator": [{
"field": "body",
"suggest_mode": "popular"
}]
}
}
},
"size": 0
}
response = client.es.search(
index=client.index_name,
body=suggest_body
)
suggestions = set()
for suggester in ["title_suggestion", "body_suggestion"]:
for entry in response.get("suggest", {}).get(suggester, []):
for option in entry.get("options", []):
if option["text"].lower() != query.lower():
suggestions.add(option["text"])
return list(suggestions)[:3]
Add the spell check to the search API:
# Add to the search endpoint in app/main.py
from app.search.spell_check import get_spelling_suggestions
# Inside the search function, after getting results:
if result["total"] == 0 or result["total"] < 3:
suggestions = get_spelling_suggestions(q)
if suggestions:
result["spelling_suggestions"] = suggestions
5. Performance Best Practices
Indexing
- Use bulk API for indexing — never index documents one at a time.
- Set
refresh_interval: 30sduring bulk indexing, then reset to1s. - Generate embeddings in batches of 64-128 for optimal GPU/CPU utilization.
- Use deterministic document IDs so re-indexing updates existing documents.
Search
- Cache frequent queries in Redis with a 5-minute TTL.
- Limit re-ranking to the top 20 candidates for acceptable latency.
- Use
_sourcefiltering to return only needed fields. - Set
terminate_afterto prevent slow queries from scanning all shards.
Infrastructure
- Give Elasticsearch at least 50% of available RAM for the JVM heap.
- Use SSD storage for Elasticsearch data volumes.
- Set
bootstrap.memory_lock: trueto prevent swapping. - Monitor cluster health with the
_cluster/healthendpoint.
Monitoring
- Track P50, P95, and P99 search latency.
- Alert on zero-result query spikes (indicates content gaps).
- Monitor cache hit ratio — aim for 60%+ for common workloads.
- Track click-through rate by position (position 1 should have ~30% CTR).
Frequently Asked Questions
How many documents can this handle?
On a single Elasticsearch node with 4 GB RAM, the engine handles 500K-1M documents comfortably. The HNSW index for vector search uses approximately 4 bytes per dimension per document, so 1M documents with 384-dimensional vectors uses about 1.5 GB of RAM for the vector index alone. For larger datasets, use multiple Elasticsearch nodes.
Can I use a different embedding model?
Yes. Change the EMBEDDING_MODEL and EMBEDDING_DIMENSION in your .env file. Popular alternatives include:
all-mpnet-base-v2(768 dims, higher quality, 2x slower)all-MiniLM-L12-v2(384 dims, better than L6, slightly slower)e5-small-v2(384 dims, excellent for search tasks)bge-small-en-v1.5(384 dims, strong performance on MTEB benchmarks)
After changing the model, you must re-index all documents to regenerate embeddings.
Can I use OpenAI embeddings instead of sentence-transformers?
Yes. Replace the encode_texts and encode_query functions in app/embeddings/encoder.py with OpenAI API calls. Use text-embedding-3-small (1536 dims, $0.02/1M tokens) and update EMBEDDING_DIMENSION to 1536. The tradeoff is cost and latency vs local processing.
How does this compare to Algolia or Typesense?
Managed services like Algolia provide instant setup, dashboard analytics, and built-in typo tolerance. This project gives you full control, zero ongoing cost, and semantic search capabilities that most managed services do not offer. Choose managed services for quick time-to-market; build your own for custom AI features and cost control.
Can I add multi-language support?
Yes. Use a multilingual embedding model like paraphrase-multilingual-MiniLM-L12-v2 which supports 50+ languages. For keyword search, configure language-specific analyzers in the Elasticsearch mapping (e.g., french, german, chinese).
What if I need real-time indexing?
The current pipeline processes documents in batches. For real-time indexing, add a message queue (Redis Streams or RabbitMQ) between the ingestion API and the indexing pipeline. New documents are queued immediately and indexed within seconds by a background worker.
How do I evaluate search quality?
Use these metrics to measure and improve relevance:
- NDCG@10: Normalized Discounted Cumulative Gain at position 10. Measures ranking quality with graded relevance.
- MRR: Mean Reciprocal Rank. The average of 1/position-of-first-relevant-result.
- Click-Through Rate: What percentage of search results get clicked. Track by position.
- Zero-Result Rate: What percentage of queries return no results. Keep below 5%.
Project Summary
Congratulations! You have built a complete AI-powered search engine from scratch. Here is what you accomplished across all 8 lessons:
- Lesson 1: Project architecture with FastAPI, Elasticsearch, and sentence-transformers.
- Lesson 2: Data indexing pipeline with text processing and embedding generation.
- Lesson 3: BM25 keyword search with multi-field matching and highlighting.
- Lesson 4: Semantic vector search with kNN and cosine similarity.
- Lesson 5: Hybrid search with Reciprocal Rank Fusion and cross-encoder re-ranking.
- Lesson 6: Search UI with autocomplete, facets, and pagination.
- Lesson 7: Docker deployment with Redis caching and query analytics.
- Lesson 8: Personalization, A/B testing, query understanding, and best practices.
The entire codebase is production-ready and can be extended for any search use case — documentation search, e-commerce product search, knowledge base search, or internal enterprise search.
Lilly Tech Systems