Advanced

Step 4: Hybrid Search & Re-ranking

This is the core of the AI search engine. You will combine BM25 keyword scores with semantic vector scores using Reciprocal Rank Fusion (RRF), then apply a cross-encoder model to re-rank the top candidates for maximum precision. The result is a search system that outperforms either approach alone.

Why Hybrid Search?

We saw in previous lessons that keyword and semantic search have complementary strengths:

BM25 excels at exact matches: "FastAPI", error codes, product names.
Semantic excels at meaning: "how to build web apps" finds "FastAPI tutorial."
Hybrid gets both: exact matches rank high AND conceptually similar documents appear.

Research consistently shows that hybrid search outperforms either approach individually. The question is how to combine the scores.

Reciprocal Rank Fusion (RRF)

RRF is a simple, effective algorithm for combining ranked lists from different scoring systems. It does not require score normalization because it only uses rank positions:

RRF Score = sum( 1 / (k + rank_i) ) for each ranking system i

Where:
  - k = 60 (constant that prevents top-ranked items from dominating)
  - rank_i = position in ranking system i (1-based)

Example for document D:
  BM25 rank: 3     -> 1/(60+3) = 0.01587
  Semantic rank: 1  -> 1/(60+1) = 0.01639
  RRF Score: 0.01587 + 0.01639 = 0.03226

💡

Why RRF Over Score Normalization? BM25 scores range from 0 to 50+, while cosine similarity ranges from -1 to 1. Normalizing these to the same scale requires heuristics that break across different datasets. RRF sidesteps this entirely by using only rank positions, making it robust and dataset-agnostic.

The Hybrid Search Module

Create the hybrid search module with RRF fusion:

# app/search/hybrid.py
"""Hybrid search combining BM25 and semantic search with RRF fusion."""
from app.search.keyword import keyword_search
from app.search.semantic import semantic_search
from app.config import get_settings
import logging

logger = logging.getLogger(__name__)
settings = get_settings()


def reciprocal_rank_fusion(
    keyword_results: list[dict],
    semantic_results: list[dict],
    k: int = 60
) -> list[dict]:
    """Combine two ranked lists using Reciprocal Rank Fusion.

    Args:
        keyword_results: Results from BM25 keyword search.
        semantic_results: Results from semantic vector search.
        k: RRF constant (default 60, from the original paper).

    Returns:
        Merged and re-ranked list of results with RRF scores.
    """
    # Build a map of document_id -> result data + scores
    doc_map = {}

    # Process keyword results
    for rank, result in enumerate(keyword_results, start=1):
        doc_id = result["id"]
        rrf_score = 1.0 / (k + rank)

        if doc_id not in doc_map:
            doc_map[doc_id] = {
                "id": doc_id,
                "source": result["source"],
                "highlights": result.get("highlights", {}),
                "rrf_score": 0.0,
                "keyword_rank": rank,
                "keyword_score": result["score"],
                "semantic_rank": None,
                "semantic_score": None
            }
        doc_map[doc_id]["rrf_score"] += rrf_score
        doc_map[doc_id]["keyword_rank"] = rank
        doc_map[doc_id]["keyword_score"] = result["score"]

    # Process semantic results
    for rank, result in enumerate(semantic_results, start=1):
        doc_id = result["id"]
        rrf_score = 1.0 / (k + rank)

        if doc_id not in doc_map:
            doc_map[doc_id] = {
                "id": doc_id,
                "source": result["source"],
                "highlights": result.get("highlights", {}),
                "rrf_score": 0.0,
                "keyword_rank": None,
                "keyword_score": None,
                "semantic_rank": rank,
                "semantic_score": None
            }
        doc_map[doc_id]["rrf_score"] += rrf_score
        doc_map[doc_id]["semantic_rank"] = rank
        doc_map[doc_id]["semantic_score"] = result["score"]

    # Sort by RRF score descending
    fused = sorted(doc_map.values(), key=lambda x: x["rrf_score"], reverse=True)

    return fused


def hybrid_search(
    query: str,
    top_k: int = None,
    category: str = None,
    tags: list[str] = None,
    rrf_k: int = 60,
    candidate_multiplier: int = 3
) -> dict:
    """Run hybrid search: BM25 + semantic + RRF fusion.

    Args:
        query: The user's search query.
        top_k: Final number of results to return.
        category: Optional category filter.
        tags: Optional tag filters.
        rrf_k: RRF constant (default 60).
        candidate_multiplier: Fetch N * top_k candidates from each system.

    Returns:
        Dict with fused 'results' list and metadata.
    """
    if top_k is None:
        top_k = settings.search_top_k

    # Fetch more candidates than needed for better fusion
    fetch_k = top_k * candidate_multiplier

    # Run both searches in parallel (could use asyncio.gather in async version)
    kw_results = keyword_search(
        query=query, top_k=fetch_k, category=category, tags=tags
    )
    sem_results = semantic_search(
        query=query, top_k=fetch_k, category=category, tags=tags
    )

    # Fuse with RRF
    fused = reciprocal_rank_fusion(
        kw_results["results"],
        sem_results["results"],
        k=rrf_k
    )

    # Take top_k results
    results = fused[:top_k]

    # Convert to standard result format
    formatted = []
    for item in results:
        formatted.append({
            "id": item["id"],
            "score": item["rrf_score"],
            "source": item["source"],
            "highlights": item["highlights"],
            "ranking_details": {
                "keyword_rank": item["keyword_rank"],
                "keyword_score": item["keyword_score"],
                "semantic_rank": item["semantic_rank"],
                "semantic_score": item["semantic_score"]
            }
        })

    total = max(kw_results["total"], sem_results["total"])

    logger.info(
        f"Hybrid search for '{query}': "
        f"keyword={len(kw_results['results'])}, "
        f"semantic={len(sem_results['results'])}, "
        f"fused={len(formatted)}"
    )

    return {
        "results": formatted,
        "total": total,
        "query": query,
        "mode": "hybrid"
    }

Cross-Encoder Re-ranking

RRF gives us a good initial ranking, but we can improve precision further with a cross-encoder. Unlike bi-encoders (sentence-transformers) that encode query and document separately, a cross-encoder processes the query-document pair together, enabling deeper interaction between them:

Bi-encoder (fast, used for retrieval):
  encode("search query") -> [0.12, -0.34, ...]
  encode("document text") -> [0.45, 0.23, ...]
  score = cosine(query_vec, doc_vec)

Cross-encoder (slow, used for re-ranking):
  score = model("search query", "document text") -> 8.73
  // Processes both texts together through all transformer layers
  // Much more accurate, but too slow for initial retrieval

Create the re-ranking module:

# app/search/reranker.py
"""Cross-encoder re-ranking for search results."""
from sentence_transformers import CrossEncoder
from app.config import get_settings
import logging

logger = logging.getLogger(__name__)
settings = get_settings()

# Load model once at module level
_reranker = None


def get_reranker() -> CrossEncoder:
    """Lazy-load the cross-encoder re-ranking model."""
    global _reranker
    if _reranker is None:
        logger.info(f"Loading re-ranker model: {settings.reranker_model}")
        _reranker = CrossEncoder(settings.reranker_model)
        logger.info("Re-ranker model loaded")
    return _reranker


def rerank_results(
    query: str,
    results: list[dict],
    top_k: int = None
) -> list[dict]:
    """Re-rank search results using a cross-encoder model.

    The cross-encoder scores each (query, document) pair for relevance.
    This is more accurate than bi-encoder similarity but slower,
    so we only apply it to the top candidates from initial retrieval.

    Args:
        query: The original search query.
        results: List of result dicts from hybrid/keyword/semantic search.
        top_k: Number of results to return after re-ranking.

    Returns:
        Re-ranked list of results with updated scores.
    """
    if not results:
        return results

    if top_k is None:
        top_k = settings.search_top_k

    reranker = get_reranker()

    # Build query-document pairs for the cross-encoder
    pairs = []
    for result in results:
        doc_text = result["source"].get("title", "") + ". " + result["source"].get("body", "")
        # Truncate to 512 tokens (cross-encoder limit)
        doc_text = doc_text[:2000]
        pairs.append((query, doc_text))

    # Score all pairs
    scores = reranker.predict(pairs)

    # Attach scores and sort
    for result, score in zip(results, scores):
        result["rerank_score"] = float(score)
        result["original_score"] = result["score"]
        result["score"] = float(score)  # Replace score with rerank score

    # Sort by cross-encoder score descending
    reranked = sorted(results, key=lambda x: x["score"], reverse=True)

    logger.info(
        f"Re-ranked {len(results)} results. "
        f"Top score: {reranked[0]['score']:.4f}, "
        f"Bottom score: {reranked[-1]['score']:.4f}"
    )

    return reranked[:top_k]

Complete Hybrid + Re-ranking Pipeline

Update the search API to use the full pipeline:

# Update app/main.py - complete search endpoint
from app.search.keyword import keyword_search
from app.search.semantic import semantic_search
from app.search.hybrid import hybrid_search
from app.search.reranker import rerank_results


@app.get("/api/search")
async def search(
    q: str,
    mode: str = None,
    top_k: int = 10,
    category: str = None,
    tags: str = None,
    page: int = 1,
    rerank: bool = True
):
    """Search documents with keyword, semantic, or hybrid matching.

    Query params:
        q: Search query string
        mode: 'keyword', 'semantic', or 'hybrid' (default from settings)
        top_k: Number of results per page
        category: Filter by category
        tags: Comma-separated tag filters
        page: Page number (1-based)
        rerank: Apply cross-encoder re-ranking (default True for hybrid)
    """
    if mode is None:
        mode = settings.search_default_mode

    tag_list = tags.split(",") if tags else None
    from_offset = (page - 1) * top_k

    if mode == "keyword":
        result = keyword_search(
            query=q, top_k=top_k, category=category,
            tags=tag_list, from_offset=from_offset
        )
    elif mode == "semantic":
        result = semantic_search(
            query=q, top_k=top_k, category=category, tags=tag_list
        )
    elif mode == "hybrid":
        # Fetch more candidates for re-ranking
        fetch_k = settings.reranker_top_k if rerank else top_k
        result = hybrid_search(
            query=q, top_k=fetch_k, category=category, tags=tag_list
        )

        # Apply cross-encoder re-ranking
        if rerank and result["results"]:
            result["results"] = rerank_results(
                query=q, results=result["results"], top_k=top_k
            )
            result["reranked"] = True
    else:
        return {"error": f"Unknown search mode: {mode}"}

    return result

Performance Characteristics

Search Mode Performance (approximate, 100K documents):

| Mode              | Latency   | Precision | Recall |
|-------------------|-----------|-----------|--------|
| Keyword (BM25)    | 5-15 ms   | High      | Medium |
| Semantic (kNN)    | 10-30 ms  | Medium    | High   |
| Hybrid (RRF)      | 20-50 ms  | High      | High   |
| Hybrid + Rerank   | 50-200 ms | Highest   | High   |

The cross-encoder adds 30-150 ms depending on the number of candidates.
Only re-rank the top 20 candidates for acceptable latency.

📝

Production Tip: For latency-sensitive applications, make re-ranking optional. Use the rerank=false parameter for autocomplete and instant results, but enable it for the main search results page where users expect higher quality.

Test the Complete Pipeline

# Hybrid search with re-ranking (default)
curl "http://localhost:8000/api/search?q=how+to+build+web+applications&mode=hybrid"

# Hybrid without re-ranking (faster)
curl "http://localhost:8000/api/search?q=how+to+build+web+applications&mode=hybrid&rerank=false"

# Compare all three modes for the same query
curl "http://localhost:8000/api/search?q=vector+database&mode=keyword"
curl "http://localhost:8000/api/search?q=vector+database&mode=semantic"
curl "http://localhost:8000/api/search?q=vector+database&mode=hybrid"

# The hybrid results should include the best from both approaches

Key Takeaways

Reciprocal Rank Fusion combines ranked lists without requiring score normalization. It is simple, robust, and dataset-agnostic.
The RRF constant k=60 balances the influence of top-ranked vs lower-ranked items.
Cross-encoder re-ranking processes query-document pairs together for the highest accuracy, but is too slow for initial retrieval.
The two-stage pipeline (retrieve then re-rank) gives us both speed and precision.
Ranking details (keyword rank, semantic rank, re-rank score) are returned for debugging and transparency.

What Is Next

The search backend is complete. In the next lesson, you will build the search interface — a production-quality UI with autocomplete, faceted filters, highlighted snippets, and paginated results.

← Previous Semantic Search Next → Search Interface