Intermediate

ONNX Runtime Web

Run any ONNX model in the browser or Node.js using ONNX Runtime — the universal model inference engine supporting WebAssembly and WebGPU.

What is ONNX?

ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It allows you to train a model in PyTorch, TensorFlow, or scikit-learn, export it to ONNX format, and run it anywhere — including the browser.

Setup

Terminal
npm install onnxruntime-web    # Browser
npm install onnxruntime-node   # Node.js

Running a Model

JavaScript
import * as ort from 'onnxruntime-web';

// Load model
const session = await ort.InferenceSession.create('model.onnx', {
  executionProviders: ['webgpu', 'wasm']  // fallback chain
});

// Prepare input tensor
const inputData = new Float32Array([1.0, 2.0, 3.0, 4.0]);
const inputTensor = new ort.Tensor('float32', inputData, [1, 4]);

// Run inference
const feeds = { input: inputTensor };
const results = await session.run(feeds);

// Read output
const output = results.output.data;
console.log('Predictions:', output);

Exporting Models to ONNX

Python - Export from PyTorch
import torch
import torch.onnx

# Your trained PyTorch model
model = MyModel()
model.eval()

# Export to ONNX
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "model.onnx",
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch"}}
)

Execution Providers

ProviderEnvironmentPerformanceCompatibility
WebGPUBrowserFastest (GPU)Chrome 113+, Edge
WebGLBrowserFast (GPU)All modern browsers
WASMBrowserGood (CPU)Universal
CUDANode.jsFastestNVIDIA GPUs
CPUNode.jsBaselineUniversal

Model Optimization

Python - Quantize for Web
from onnxruntime.quantization import quantize_dynamic, QuantType

# Reduce model size by 4x with dynamic quantization
quantize_dynamic(
    "model.onnx",
    "model_quantized.onnx",
    weight_type=QuantType.QUInt8
)
When to use ONNX Runtime: Choose ONNX when you have an existing model trained in Python and want maximum compatibility and performance in the browser. It supports more model architectures than TensorFlow.js conversion.