Generating Embeddings Intermediate

Embeddings are numerical representations of text that capture semantic meaning. Similar texts produce similar vectors, which is the foundation of semantic search and RAG. In this lesson, you will generate embeddings and store them in Supabase.

What Are Embeddings?

An embedding model converts text into a high-dimensional vector (array of numbers). For example, the sentence "The cat sat on the mat" might become a vector of 1536 numbers. Semantically similar sentences produce vectors that are close together in this high-dimensional space.

Generating Embeddings with OpenAI

TypeScript

import OpenAI from 'openai';
import { createClient } from '@supabase/supabase-js';

const openai = new OpenAI();
const supabase = createClient(SUPABASE_URL, SUPABASE_KEY);

async function generateAndStore(title: string, content: string) {
  // 1. Generate embedding
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: content,
  });

  const embedding = response.data[0].embedding;

  // 2. Store in Supabase
  const { error } = await supabase
    .from('documents')
    .insert({ title, content, embedding });

  if (error) throw error;
}

Batch Embedding

For large datasets, generate embeddings in batches to stay within API rate limits:

TypeScript

async function batchEmbed(documents: { title: string; content: string }[]) {
  const batchSize = 100;

  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);
    const inputs = batch.map(d => d.content);

    // OpenAI supports batch embedding in one call
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: inputs,
    });

    const rows = batch.map((doc, idx) => ({
      title: doc.title,
      content: doc.content,
      embedding: response.data[idx].embedding,
    }));

    await supabase.from('documents').insert(rows);
    console.log(`Embedded ${i + batch.length} / ${documents.length}`);
  }
}

Embedding Models Comparison

Model	Dimensions	Cost	Quality
text-embedding-3-small	1536	$0.02 / 1M tokens	Good
text-embedding-3-large	3072	$0.13 / 1M tokens	Best
Cohere embed-v3	1024	$0.10 / 1M tokens	Great

Consistency Matters: Always use the same embedding model for both storing and querying. If you generated embeddings with text-embedding-3-small, your search queries must also use text-embedding-3-small. Mixing models will produce meaningless results.

Embeddings Stored!

Your documents are now vectorized and stored in Supabase. In the next lesson, you will build semantic search queries.

Next: Vector Search →

← pgvector Setup Vector Search →