Lesson 3 of 13

Embedding Models

Transform text, images, and code into numerical vectors that capture semantic meaning

What Are Embeddings?

Embeddings are numerical vector representations of data — text, images, audio, or code — that capture the semantic meaning of that data in a high-dimensional space. Instead of treating words as arbitrary symbols, embedding models convert them into dense arrays of floating-point numbers where similar concepts are close together and dissimilar concepts are far apart.

Think of it this way: the words "king" and "queen" would have vectors that are near each other in embedding space, while "king" and "refrigerator" would be far apart. This mathematical representation allows machines to understand and compare meaning in ways that raw text cannot support.

Key Insight: Embeddings are the bridge between human language and mathematical computation. They let you perform operations like "find documents similar to this query" or "cluster these articles by topic" using pure vector math.

How Embedding Models Work

Embedding models use deep neural networks — typically transformer architectures — to encode input data into dense, fixed-length vectors. Here is the general process:

  1. Tokenization: Input text is split into tokens (words, subwords, or characters).
  2. Encoding: Tokens pass through multiple transformer layers that learn contextual relationships between them.
  3. Pooling: The model aggregates token-level representations into a single vector (using mean pooling, CLS token pooling, or other strategies).
  4. Normalization: The final vector is often L2-normalized so that cosine similarity can be computed efficiently.

The resulting vector — typically between 256 and 3072 dimensions — encodes the semantic meaning of the entire input. Two pieces of text with similar meaning will produce vectors with a high cosine similarity score (close to 1.0), even if they use completely different words.

Important: Embeddings are not the same as LLM outputs. An LLM generates text; an embedding model generates a fixed-size numerical vector. They serve fundamentally different purposes.

Types of Embeddings

Text Embeddings

The most common type. These models convert sentences, paragraphs, or entire documents into vectors. They are the foundation of semantic search, retrieval-augmented generation (RAG), and document clustering.

Image Embeddings

Vision models like CLIP produce vectors for images, enabling you to search for images using text queries ("a cat sitting on a laptop") or find visually similar images. Image embeddings power reverse image search and visual recommendation systems.

Code Embeddings

Specialized models encode source code into vectors, capturing programming logic and structure. These are used for code search, duplicate detection, and vulnerability scanning. Models like CodeBERT and StarCoder produce code-aware embeddings.

Multimodal Embeddings

Models like CLIP and ImageBind produce embeddings that place text, images, and sometimes audio into the same vector space, enabling cross-modal search and retrieval.

Key Embedding Models

The embedding model landscape has evolved rapidly. Here are the most important models to know:

Model Provider Dimensions Max Tokens Best For
text-embedding-3-large OpenAI 3072 8191 High-accuracy retrieval, RAG
text-embedding-3-small OpenAI 1536 8191 Cost-effective general purpose
Cohere Embed v3 Cohere 1024 512 Multilingual, search & classification
BGE-large-en-v1.5 BAAI 1024 512 Open-source, self-hosted
E5-large-v2 Microsoft 1024 512 Open-source, strong MTEB scores
GTE-large Alibaba 1024 512 Open-source, multilingual
Sentence-BERT UKP Lab 768 512 Lightweight, fast inference
CLIP (ViT-L/14) OpenAI 768 77 Text-image cross-modal search
Choosing dimensions: Higher dimensions generally capture more nuance but require more storage and compute. OpenAI's text-embedding-3 models support dimension reduction via the Matryoshka technique — you can request 256, 512, or 1024 dimensions instead of the full size while retaining most quality.

Use Cases for Embedding Models

Semantic Search

Traditional keyword search fails when users phrase queries differently from the source documents. Embedding-based search compares meaning, not words. A query like "how to fix a broken pipe" will match documents about plumbing repairs even if those documents never use the exact phrase.

Retrieval-Augmented Generation (RAG)

RAG is the most popular enterprise AI pattern today. You embed your documents into a vector database, then at query time, retrieve the most relevant chunks and pass them as context to an LLM. Embedding quality directly determines RAG quality.

Clustering and Topic Modeling

Embed a corpus of documents, then apply clustering algorithms (K-means, HDBSCAN) to group them by topic. This works without any predefined categories — the clusters emerge from the semantic structure of the data itself.

Deduplication

Find near-duplicate documents, support tickets, or records by computing embeddings and identifying pairs with very high cosine similarity (e.g., > 0.95). This catches duplicates that differ in phrasing but express the same content.

Recommendation Systems

Embed product descriptions, articles, or user profiles. Then recommend items whose embeddings are closest to the user's interest vector. This enables content-based recommendations without collaborative filtering data.

Vector Databases

Embeddings need to be stored and searched efficiently. Vector databases are purpose-built for this, supporting approximate nearest neighbor (ANN) search over millions or billions of vectors.

Database Type Key Features
Pinecone Managed cloud Serverless, auto-scaling, metadata filtering
Weaviate Open source / cloud Hybrid search, GraphQL API, modules
ChromaDB Open source Lightweight, Python-native, great for prototyping
Qdrant Open source / cloud Rust-based, fast, advanced filtering
pgvector PostgreSQL extension Use existing Postgres infrastructure, SQL interface
Milvus Open source / cloud Highly scalable, GPU-accelerated, enterprise-ready

Code Example: Generating Embeddings with OpenAI

Here is how to generate text embeddings using the OpenAI API in Python:

Python
from openai import OpenAI

client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
    """Generate an embedding vector for the given text."""
    text = text.replace("\n", " ")
    response = client.embeddings.create(
        input=[text],
        model=model
    )
    return response.data[0].embedding

# Generate embeddings for sample texts
texts = [
    "The cat sat on the mat",
    "A kitten was resting on the rug",
    "Python is a programming language",
]

embeddings = [get_embedding(t) for t in texts]

print(f"Vector dimensions: {len(embeddings[0])}")
print(f"First 5 values: {embeddings[0][:5]}")
# Output: Vector dimensions: 1536
# Output: First 5 values: [0.0023, -0.0091, 0.0152, ...]

Code Example: Cosine Similarity Calculation

Once you have embeddings, you can compute how similar two texts are using cosine similarity:

Python
import numpy as np

def cosine_similarity(vec_a, vec_b):
    """Compute cosine similarity between two vectors."""
    dot_product = np.dot(vec_a, vec_b)
    norm_a = np.linalg.norm(vec_a)
    norm_b = np.linalg.norm(vec_b)
    return dot_product / (norm_a * norm_b)

# Compare our three texts
sim_cat_kitten = cosine_similarity(embeddings[0], embeddings[1])
sim_cat_python = cosine_similarity(embeddings[0], embeddings[2])

print(f"'cat on mat' vs 'kitten on rug': {sim_cat_kitten:.4f}")
print(f"'cat on mat' vs 'Python language': {sim_cat_python:.4f}")
# Output: 'cat on mat' vs 'kitten on rug': 0.8742
# Output: 'cat on mat' vs 'Python language': 0.1523

# Find the most similar text to a query
query = "Where is the feline?"
query_embedding = get_embedding(query)

similarities = [
    (text, cosine_similarity(query_embedding, emb))
    for text, emb in zip(texts, embeddings)
]

# Sort by similarity (highest first)
similarities.sort(key=lambda x: x[1], reverse=True)
for text, score in similarities:
    print(f"  {score:.4f} — {text}")
Performance tip: For production workloads, avoid computing cosine similarity manually. Use a vector database with ANN indexes (HNSW, IVF) that can search millions of vectors in milliseconds.

When to Use Embedding Models

Embedding models are the right choice when you need to:

  • Search by meaning rather than keywords — semantic search and question answering over documents
  • Build a RAG pipeline — embedding your knowledge base is the first step in retrieval-augmented generation
  • Group or classify text without labeled training data — clustering and zero-shot classification via embeddings
  • Detect duplicates or find similar items across large datasets
  • Build recommendations based on content similarity rather than user behavior

Embedding models are not the right choice when you need to generate text, translate languages, or perform complex reasoning. For those tasks, use an LLM instead. Embeddings capture what something means; LLMs determine what to say about it.

Summary

  • Embeddings convert data into dense numerical vectors that capture semantic meaning.
  • Modern embedding models use transformer architectures and produce vectors with 256 to 3072 dimensions.
  • Key models include OpenAI text-embedding-3, Cohere Embed v3, BGE, E5, GTE, and CLIP for multimodal.
  • Primary use cases are semantic search, RAG, clustering, deduplication, and recommendations.
  • Vector databases like Pinecone, Weaviate, ChromaDB, and Qdrant are essential for production embedding workloads.
  • Cosine similarity is the standard metric for comparing embedding vectors.