Generating Embeddings Intermediate

Embeddings are numerical representations of text that capture semantic meaning. Similar texts produce similar vectors, which is the foundation of semantic search and RAG. In this lesson, you will generate embeddings and store them in Supabase.

What Are Embeddings?

An embedding model converts text into a high-dimensional vector (array of numbers). For example, the sentence "The cat sat on the mat" might become a vector of 1536 numbers. Semantically similar sentences produce vectors that are close together in this high-dimensional space.

Generating Embeddings with OpenAI

TypeScript
import OpenAI from 'openai';
import { createClient } from '@supabase/supabase-js';

const openai = new OpenAI();
const supabase = createClient(SUPABASE_URL, SUPABASE_KEY);

async function generateAndStore(title: string, content: string) {
  // 1. Generate embedding
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: content,
  });

  const embedding = response.data[0].embedding;

  // 2. Store in Supabase
  const { error } = await supabase
    .from('documents')
    .insert({ title, content, embedding });

  if (error) throw error;
}

Batch Embedding

For large datasets, generate embeddings in batches to stay within API rate limits:

TypeScript
async function batchEmbed(documents: { title: string; content: string }[]) {
  const batchSize = 100;

  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);
    const inputs = batch.map(d => d.content);

    // OpenAI supports batch embedding in one call
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: inputs,
    });

    const rows = batch.map((doc, idx) => ({
      title: doc.title,
      content: doc.content,
      embedding: response.data[idx].embedding,
    }));

    await supabase.from('documents').insert(rows);
    console.log(`Embedded ${i + batch.length} / ${documents.length}`);
  }
}

Embedding Models Comparison

Model Dimensions Cost Quality
text-embedding-3-small 1536 $0.02 / 1M tokens Good
text-embedding-3-large 3072 $0.13 / 1M tokens Best
Cohere embed-v3 1024 $0.10 / 1M tokens Great
Consistency Matters: Always use the same embedding model for both storing and querying. If you generated embeddings with text-embedding-3-small, your search queries must also use text-embedding-3-small. Mixing models will produce meaningless results.

Embeddings Stored!

Your documents are now vectorized and stored in Supabase. In the next lesson, you will build semantic search queries.

Next: Vector Search →