Advanced

Vector Database Best Practices

Production-ready tips for choosing, configuring, operating, and optimizing vector databases in real-world AI applications.

Choosing the Right Vector Database

  1. Start with Your Constraints

    Before comparing features, identify your constraints: budget, team expertise, existing infrastructure, data volume, query latency requirements, and compliance needs. These narrow the field quickly.

  2. Prototype with ChromaDB, Scale with Others

    Build your proof-of-concept with ChromaDB (fastest to set up). Once you validate the approach, migrate to a production database that matches your scale and operational requirements.

  3. Consider Operational Complexity

    A managed service (Pinecone, Weaviate Cloud) costs more but saves engineering time. Self-hosted (Qdrant, Milvus) is cheaper but requires infrastructure expertise.

Index Configuration

  • Use HNSW for most workloads. It offers the best balance of recall, speed, and simplicity. Start with default parameters and tune from there.
  • Increase ef_search for higher recall. If search quality matters more than latency, raise ef_search (HNSW) or nprobe (IVF).
  • Match dimensions to your model. The index dimension must exactly match your embedding model's output dimension.
  • Use the same distance metric as your model. Most text embedding models are trained with cosine similarity. Using a different metric will give poor results.

Batch Operations

Python - Efficient Batch Upsert
import asyncio
from concurrent.futures import ThreadPoolExecutor

def batch_upsert(vectors, batch_size=100):
    """Upsert vectors in batches for efficiency."""
    for i in range(0, len(vectors), batch_size):
        batch = vectors[i:i + batch_size]
        index.upsert(vectors=batch)
        print(f"Upserted batch {i // batch_size + 1}")

# For very large datasets, use parallel batches
async def parallel_upsert(vectors, batch_size=100, max_workers=4):
    """Upsert in parallel for maximum throughput."""
    batches = [vectors[i:i+batch_size]
               for i in range(0, len(vectors), batch_size)]

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [
            executor.submit(index.upsert, vectors=batch)
            for batch in batches
        ]
        for future in futures:
            future.result()  # Wait for completion

Monitoring and Maintenance

  • Track query latency percentiles (p50, p95, p99), not just averages. A healthy p99 under 100ms is typical.
  • Monitor recall quality by periodically comparing ANN results against brute-force exact search on a sample.
  • Watch index size and memory usage. HNSW indexes grow with data. Plan capacity accordingly.
  • Set up alerts for query latency spikes, error rates, and storage thresholds.
  • Log query patterns to identify popular queries, cache opportunities, and optimization targets.

Cost Optimization

Strategy Impact Trade-off
Use smaller embeddings 2–4x storage savings Slightly lower recall
Enable quantization 4–8x memory savings Lower recall accuracy
Use serverless Pay only for usage Possible cold starts
Cache frequent queries Eliminate redundant searches Stale results possible
Reduce metadata Lower storage costs Less flexible filtering
Partition by time Archive old data More complex queries

Security

  • Never embed API keys in code. Use environment variables or secret managers (AWS Secrets Manager, Vault).
  • Enable authentication on self-hosted deployments. Many vector databases ship with auth disabled by default.
  • Use TLS/SSL for all connections, especially in production.
  • Implement access control. Use namespaces, multi-tenancy, or row-level security to isolate data.
  • Audit access logs to track who queries what data.

Backup Strategies

  • For managed services: Most providers handle backups automatically. Verify the backup frequency and retention policy.
  • For self-hosted: Schedule regular snapshots of the data directory. Test restoration periodically.
  • Keep a copy of raw embeddings separately (e.g., in object storage). If you lose the index, you can rebuild it from the raw vectors.
  • Store the embedding model version. If you need to re-embed data, you must use the same model version for consistency.

Scaling Patterns

  1. Vertical Scaling

    Add more RAM and faster storage. HNSW indexes are memory-bound, so more RAM directly improves capacity. Effective up to ~10M vectors on a single node.

  2. Horizontal Scaling (Sharding)

    Distribute data across multiple nodes. Each shard holds a subset of vectors. Queries are fanned out to all shards and results are merged. Supported by Milvus, Weaviate, and Qdrant.

  3. Read Replicas

    Add read-only replicas to handle more query traffic. Writes go to the primary, reads are distributed across replicas.

  4. Tiered Storage

    Keep frequently accessed vectors in memory and older vectors on disk. Some databases (Milvus, Qdrant) support this natively.

Common Mistakes

Avoid these common pitfalls:
  • Mismatched dimensions: The index dimension must exactly match your embedding model output. A mismatch causes errors or garbage results.
  • Wrong distance metric: Using Euclidean distance with a model trained for cosine similarity gives poor rankings.
  • Not normalizing vectors: If your model does not output normalized vectors, normalize them before inserting if you use dot product.
  • Mixing embedding models: All vectors in a collection must come from the same model. Mixing models makes similarity meaningless.
  • Storing too much metadata: Large metadata payloads slow down queries. Store only what you need for filtering; keep full documents elsewhere.
  • Ignoring index warm-up: HNSW indexes need to be loaded into RAM. First queries after restart may be slow.
  • No evaluation pipeline: Not measuring search quality means you cannot tell if changes improve or degrade results.

Frequently Asked Questions

It depends on the dimensions and index type. As a rough guide: with HNSW and 1536-dimension vectors, a machine with 32GB RAM can handle about 2–5 million vectors. With quantization, this can increase to 10–20 million.

A vector database stores and searches vectors, but it does not create them. You need an embedding model (OpenAI, Sentence Transformers, etc.) to convert your data into vectors first. Some databases like Weaviate have built-in vectorizers that handle this for you.

If you already use PostgreSQL and have fewer than 5 million vectors, pgvector is often the simplest choice. For larger scale, higher query throughput, or features like hybrid search, a dedicated vector database is better suited.

Create a test set with known relevant results for sample queries. Compute metrics like recall@k (what fraction of the true top-k results does the ANN search return?) and MRR (mean reciprocal rank). Aim for recall@10 above 95%.

You must re-embed all your data with the new model and create a new index. Vectors from different models live in different vector spaces and cannot be compared. Plan for this by keeping your raw text data accessible.