Embedding Models
Transform text, images, and code into numerical vectors that capture semantic meaning
What Are Embeddings?
Embeddings are numerical vector representations of data — text, images, audio, or code — that capture the semantic meaning of that data in a high-dimensional space. Instead of treating words as arbitrary symbols, embedding models convert them into dense arrays of floating-point numbers where similar concepts are close together and dissimilar concepts are far apart.
Think of it this way: the words "king" and "queen" would have vectors that are near each other in embedding space, while "king" and "refrigerator" would be far apart. This mathematical representation allows machines to understand and compare meaning in ways that raw text cannot support.
How Embedding Models Work
Embedding models use deep neural networks — typically transformer architectures — to encode input data into dense, fixed-length vectors. Here is the general process:
- Tokenization: Input text is split into tokens (words, subwords, or characters).
- Encoding: Tokens pass through multiple transformer layers that learn contextual relationships between them.
- Pooling: The model aggregates token-level representations into a single vector (using mean pooling, CLS token pooling, or other strategies).
- Normalization: The final vector is often L2-normalized so that cosine similarity can be computed efficiently.
The resulting vector — typically between 256 and 3072 dimensions — encodes the semantic meaning of the entire input. Two pieces of text with similar meaning will produce vectors with a high cosine similarity score (close to 1.0), even if they use completely different words.
Types of Embeddings
Text Embeddings
The most common type. These models convert sentences, paragraphs, or entire documents into vectors. They are the foundation of semantic search, retrieval-augmented generation (RAG), and document clustering.
Image Embeddings
Vision models like CLIP produce vectors for images, enabling you to search for images using text queries ("a cat sitting on a laptop") or find visually similar images. Image embeddings power reverse image search and visual recommendation systems.
Code Embeddings
Specialized models encode source code into vectors, capturing programming logic and structure. These are used for code search, duplicate detection, and vulnerability scanning. Models like CodeBERT and StarCoder produce code-aware embeddings.
Multimodal Embeddings
Models like CLIP and ImageBind produce embeddings that place text, images, and sometimes audio into the same vector space, enabling cross-modal search and retrieval.
Key Embedding Models
The embedding model landscape has evolved rapidly. Here are the most important models to know:
| Model | Provider | Dimensions | Max Tokens | Best For |
|---|---|---|---|---|
| text-embedding-3-large | OpenAI | 3072 | 8191 | High-accuracy retrieval, RAG |
| text-embedding-3-small | OpenAI | 1536 | 8191 | Cost-effective general purpose |
| Cohere Embed v3 | Cohere | 1024 | 512 | Multilingual, search & classification |
| BGE-large-en-v1.5 | BAAI | 1024 | 512 | Open-source, self-hosted |
| E5-large-v2 | Microsoft | 1024 | 512 | Open-source, strong MTEB scores |
| GTE-large | Alibaba | 1024 | 512 | Open-source, multilingual |
| Sentence-BERT | UKP Lab | 768 | 512 | Lightweight, fast inference |
| CLIP (ViT-L/14) | OpenAI | 768 | 77 | Text-image cross-modal search |
Use Cases for Embedding Models
Semantic Search
Traditional keyword search fails when users phrase queries differently from the source documents. Embedding-based search compares meaning, not words. A query like "how to fix a broken pipe" will match documents about plumbing repairs even if those documents never use the exact phrase.
Retrieval-Augmented Generation (RAG)
RAG is the most popular enterprise AI pattern today. You embed your documents into a vector database, then at query time, retrieve the most relevant chunks and pass them as context to an LLM. Embedding quality directly determines RAG quality.
Clustering and Topic Modeling
Embed a corpus of documents, then apply clustering algorithms (K-means, HDBSCAN) to group them by topic. This works without any predefined categories — the clusters emerge from the semantic structure of the data itself.
Deduplication
Find near-duplicate documents, support tickets, or records by computing embeddings and identifying pairs with very high cosine similarity (e.g., > 0.95). This catches duplicates that differ in phrasing but express the same content.
Recommendation Systems
Embed product descriptions, articles, or user profiles. Then recommend items whose embeddings are closest to the user's interest vector. This enables content-based recommendations without collaborative filtering data.
Vector Databases
Embeddings need to be stored and searched efficiently. Vector databases are purpose-built for this, supporting approximate nearest neighbor (ANN) search over millions or billions of vectors.
| Database | Type | Key Features |
|---|---|---|
| Pinecone | Managed cloud | Serverless, auto-scaling, metadata filtering |
| Weaviate | Open source / cloud | Hybrid search, GraphQL API, modules |
| ChromaDB | Open source | Lightweight, Python-native, great for prototyping |
| Qdrant | Open source / cloud | Rust-based, fast, advanced filtering |
| pgvector | PostgreSQL extension | Use existing Postgres infrastructure, SQL interface |
| Milvus | Open source / cloud | Highly scalable, GPU-accelerated, enterprise-ready |
Code Example: Generating Embeddings with OpenAI
Here is how to generate text embeddings using the OpenAI API in Python:
from openai import OpenAI
client = OpenAI()
def get_embedding(text, model="text-embedding-3-small"):
"""Generate an embedding vector for the given text."""
text = text.replace("\n", " ")
response = client.embeddings.create(
input=[text],
model=model
)
return response.data[0].embedding
# Generate embeddings for sample texts
texts = [
"The cat sat on the mat",
"A kitten was resting on the rug",
"Python is a programming language",
]
embeddings = [get_embedding(t) for t in texts]
print(f"Vector dimensions: {len(embeddings[0])}")
print(f"First 5 values: {embeddings[0][:5]}")
# Output: Vector dimensions: 1536
# Output: First 5 values: [0.0023, -0.0091, 0.0152, ...]
Code Example: Cosine Similarity Calculation
Once you have embeddings, you can compute how similar two texts are using cosine similarity:
import numpy as np
def cosine_similarity(vec_a, vec_b):
"""Compute cosine similarity between two vectors."""
dot_product = np.dot(vec_a, vec_b)
norm_a = np.linalg.norm(vec_a)
norm_b = np.linalg.norm(vec_b)
return dot_product / (norm_a * norm_b)
# Compare our three texts
sim_cat_kitten = cosine_similarity(embeddings[0], embeddings[1])
sim_cat_python = cosine_similarity(embeddings[0], embeddings[2])
print(f"'cat on mat' vs 'kitten on rug': {sim_cat_kitten:.4f}")
print(f"'cat on mat' vs 'Python language': {sim_cat_python:.4f}")
# Output: 'cat on mat' vs 'kitten on rug': 0.8742
# Output: 'cat on mat' vs 'Python language': 0.1523
# Find the most similar text to a query
query = "Where is the feline?"
query_embedding = get_embedding(query)
similarities = [
(text, cosine_similarity(query_embedding, emb))
for text, emb in zip(texts, embeddings)
]
# Sort by similarity (highest first)
similarities.sort(key=lambda x: x[1], reverse=True)
for text, score in similarities:
print(f" {score:.4f} — {text}")
When to Use Embedding Models
Embedding models are the right choice when you need to:
- Search by meaning rather than keywords — semantic search and question answering over documents
- Build a RAG pipeline — embedding your knowledge base is the first step in retrieval-augmented generation
- Group or classify text without labeled training data — clustering and zero-shot classification via embeddings
- Detect duplicates or find similar items across large datasets
- Build recommendations based on content similarity rather than user behavior
Embedding models are not the right choice when you need to generate text, translate languages, or perform complex reasoning. For those tasks, use an LLM instead. Embeddings capture what something means; LLMs determine what to say about it.
Summary
- Embeddings convert data into dense numerical vectors that capture semantic meaning.
- Modern embedding models use transformer architectures and produce vectors with 256 to 3072 dimensions.
- Key models include OpenAI text-embedding-3, Cohere Embed v3, BGE, E5, GTE, and CLIP for multimodal.
- Primary use cases are semantic search, RAG, clustering, deduplication, and recommendations.
- Vector databases like Pinecone, Weaviate, ChromaDB, and Qdrant are essential for production embedding workloads.
- Cosine similarity is the standard metric for comparing embedding vectors.
Lilly Tech Systems