Generating Embeddings Intermediate
Embeddings are numerical representations of text that capture semantic meaning. Similar texts produce similar vectors, which is the foundation of semantic search and RAG. In this lesson, you will generate embeddings and store them in Supabase.
What Are Embeddings?
An embedding model converts text into a high-dimensional vector (array of numbers). For example, the sentence "The cat sat on the mat" might become a vector of 1536 numbers. Semantically similar sentences produce vectors that are close together in this high-dimensional space.
Generating Embeddings with OpenAI
TypeScript
import OpenAI from 'openai'; import { createClient } from '@supabase/supabase-js'; const openai = new OpenAI(); const supabase = createClient(SUPABASE_URL, SUPABASE_KEY); async function generateAndStore(title: string, content: string) { // 1. Generate embedding const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: content, }); const embedding = response.data[0].embedding; // 2. Store in Supabase const { error } = await supabase .from('documents') .insert({ title, content, embedding }); if (error) throw error; }
Batch Embedding
For large datasets, generate embeddings in batches to stay within API rate limits:
TypeScript
async function batchEmbed(documents: { title: string; content: string }[]) { const batchSize = 100; for (let i = 0; i < documents.length; i += batchSize) { const batch = documents.slice(i, i + batchSize); const inputs = batch.map(d => d.content); // OpenAI supports batch embedding in one call const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: inputs, }); const rows = batch.map((doc, idx) => ({ title: doc.title, content: doc.content, embedding: response.data[idx].embedding, })); await supabase.from('documents').insert(rows); console.log(`Embedded ${i + batch.length} / ${documents.length}`); } }
Embedding Models Comparison
| Model | Dimensions | Cost | Quality |
|---|---|---|---|
| text-embedding-3-small | 1536 | $0.02 / 1M tokens | Good |
| text-embedding-3-large | 3072 | $0.13 / 1M tokens | Best |
| Cohere embed-v3 | 1024 | $0.10 / 1M tokens | Great |
Consistency Matters: Always use the same embedding model for both storing and querying. If you generated embeddings with
text-embedding-3-small, your search queries must also use text-embedding-3-small. Mixing models will produce meaningless results.
Embeddings Stored!
Your documents are now vectorized and stored in Supabase. In the next lesson, you will build semantic search queries.
Next: Vector Search →