Hugging Face Hub Beginner
Hugging Face Hub is the world's largest platform for sharing AI models, datasets, and demos. With over 900,000 models, it is the go-to destination for finding and using pretrained models across every AI domain.
Overview
The Hub hosts models from individual researchers, companies, and organizations. Every model has a model card with documentation, usage examples, performance benchmarks, and license information.
Browsing Models
You can filter models on the Hub by:
- Task: text-generation, image-classification, object-detection, translation, etc.
- Library: transformers, diffusers, timm, spaCy, sentence-transformers
- Language: English, Chinese, multilingual, etc.
- License: Apache 2.0, MIT, Llama, research-only
- Size: Filter by parameter count
Installing the transformers Library
# Install transformers and required dependencies $ pip install transformers torch # Or with conda $ conda install -c conda-forge transformers pytorch
Pipeline API (Easiest Way)
The Pipeline API is the simplest way to use a pretrained model. It handles tokenization, model loading, and post-processing automatically:
from transformers import pipeline # Sentiment analysis classifier = pipeline("sentiment-analysis") result = classifier("I love this product!") print(result) # [{'label': 'POSITIVE', 'score': 0.9998}] # Text generation generator = pipeline("text-generation", model="gpt2") text = generator("The future of AI is", max_length=50) # Image classification classifier = pipeline("image-classification", model="google/vit-base-patch16-224") result = classifier("photo.jpg") # Speech recognition transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-base") result = transcriber("audio.mp3")
AutoModel and AutoTokenizer
For more control, use AutoModel and AutoTokenizer to load models and process inputs manually:
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load tokenizer and model model_name = "distilbert-base-uncased-finetuned-sst-2-english" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Tokenize input inputs = tokenizer("This movie was fantastic!", return_tensors="pt") # Run inference with torch.no_grad(): outputs = model(**inputs) predictions = torch.softmax(outputs.logits, dim=-1) print(predictions) # tensor([[0.0002, 0.9998]]) -> POSITIVE
Downloading Models
from huggingface_hub import snapshot_download # Download entire model repository snapshot_download(repo_id="bert-base-uncased") # Download to a specific directory snapshot_download( repo_id="bert-base-uncased", local_dir="./models/bert" )
Model Cards
Every model on the Hub should have a model card that includes:
- Model description: What the model does and how it was trained
- Intended uses: What tasks the model is designed for
- Training data: What data was used
- Evaluation results: Benchmark scores and metrics
- Limitations: Known weaknesses and biases
- License: How you can use the model
Trending and Popular Models
The Hub features trending models on its homepage and lets you sort by downloads, likes, and recency. Some of the most popular model families include:
| Model Family | Task | Downloads (monthly) |
|---|---|---|
| Meta Llama | Text generation | 50M+ |
| OpenAI Whisper | Speech recognition | 30M+ |
| BERT / DistilBERT | Text classification, NER | 40M+ |
| Stable Diffusion | Image generation | 10M+ |
| sentence-transformers | Embeddings | 25M+ |
Next Up
Explore pretrained models for computer vision — from image classification to object detection and image generation.
Next: Vision Models →
Lilly Tech Systems