Best Practices
Optimize AI-powered JavaScript applications with Web Workers, smart loading strategies, privacy-first design, and responsive AI UX patterns.
Web Workers for Inference
Run ML inference off the main thread to keep the UI responsive:
import { pipeline } from '@huggingface/transformers'; let classifier = null; self.onmessage = async (event) => { if (event.data.type === 'init') { classifier = await pipeline('sentiment-analysis'); self.postMessage({ type: 'ready' }); } if (event.data.type === 'predict') { const result = await classifier(event.data.text); self.postMessage({ type: 'result', data: result }); } };
const worker = new Worker('worker.js', { type: 'module' }); worker.postMessage({ type: 'init' }); worker.onmessage = (event) => { if (event.data.type === 'ready') { worker.postMessage({ type: 'predict', text: 'Great product!' }); } if (event.data.type === 'result') { console.log(event.data.data); // UI stays responsive } };
Performance Optimization
Use Quantized Models
INT8 quantized models are 4x smaller and 2-3x faster than FP32 with minimal accuracy loss. Always prefer quantized variants for browser deployment.
Lazy Load Models
Don't load ML models on page load. Load them when the user first interacts with the AI feature. Show a loading indicator.
Cache Models
Use the Cache API or IndexedDB to store downloaded models. Subsequent page loads skip the download entirely.
Batch Predictions
Group multiple inputs into a single inference call instead of running one-at-a-time for better GPU utilization.
Privacy-First AI
AI UX Patterns
| Pattern | Description | Example |
|---|---|---|
| Progressive Loading | Show basic UI, enhance with AI when model loads | Search bar works without AI, adds suggestions when ready |
| Streaming Responses | Show LLM output token by token | ChatGPT-style typing effect |
| Confidence Indicators | Show prediction confidence to users | "92% confident this is a cat" |
| Fallback Strategies | Gracefully handle model failures | Fall back to server API if browser inference fails |
Common Mistakes
- Memory leaks: Always dispose TensorFlow.js tensors. Use
tf.tidy()to auto-cleanup intermediate tensors. - Blocking the main thread: Model loading and inference should always run in Web Workers.
- Huge model downloads: A 500MB model on first page load kills UX. Use quantized models and lazy loading.
- No error handling: WebGPU/WebGL may not be available. Always provide WASM fallback.
- Exposing API keys: Never put API keys in client-side JavaScript. Use server-side API routes.
Frequently Asked Questions
Use TensorFlow.js for custom model training in the browser, computer vision, and when you need the TensorFlow ecosystem. Use Transformers.js for NLP tasks, using pre-trained Hugging Face models, and when you want the simplest possible API.
Yes, but with limitations. Small models (1-3B parameters) can run via WebGPU. Larger models require API calls to server-side inference. WebLLM and MLC Web LLM are emerging tools for browser-based LLM inference.
For inference, yes. WebGL and WebGPU provide hardware-accelerated matrix operations comparable to native CUDA for many models. For training, JavaScript is suitable for small models and fine-tuning, but heavy training is better done in Python.
Lilly Tech Systems