Transformers Library
The Transformers library is the foundation of everything Hugging Face. This lesson covers the Pipeline API for quick inference, tokenizers for text processing, model loading with AutoClasses, and understanding model outputs. Includes practice questions at the end.
Pipeline API
The Pipeline API is the simplest way to use pre-trained models. It handles tokenization, model inference, and post-processing in a single call. You must know the available pipeline tasks and how to configure them.
# Pipeline API - The simplest way to use HF models
from transformers import pipeline
# Text classification (sentiment analysis)
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("Hugging Face is amazing!")
# [{'label': 'POSITIVE', 'score': 0.9998}]
# Named Entity Recognition
ner = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english",
aggregation_strategy="simple")
entities = ner("Hugging Face is based in New York City.")
# [{'entity_group': 'ORG', 'word': 'Hugging Face', ...},
# {'entity_group': 'LOC', 'word': 'New York City', ...}]
# Question Answering
qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
answer = qa(question="What is HF?", context="Hugging Face is an AI company.")
# {'answer': 'an AI company', 'score': 0.89, ...}
# Summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer("Long article text here...", max_length=130, min_length=30)
# Translation
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
translation = translator("Hugging Face is great!")
# Zero-shot classification (no fine-tuning needed)
zero_shot = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
result = zero_shot("I love cooking pasta", candidate_labels=["food", "sports", "tech"])
# Available pipeline tasks you should know:
pipeline_tasks = [
"text-classification", # Sentiment, topic classification
"token-classification", # NER, POS tagging
"question-answering", # Extractive QA
"summarization", # Abstractive summarization
"translation", # Language translation
"text-generation", # Causal LM generation
"fill-mask", # Masked language modeling
"zero-shot-classification", # Classify without fine-tuning
"text2text-generation", # T5-style generation
"feature-extraction" # Get embeddings
]
Tokenizers
Tokenizers convert text to numerical input that models understand. You must know how to encode and decode text, handle special tokens, padding, and truncation.
# Tokenizers - Converting text to model input
from transformers import AutoTokenizer
# Load a tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Basic encoding
encoded = tokenizer("Hello, Hugging Face!")
print(encoded)
# {'input_ids': [101, 7592, 1010, 17662, 2227, 999, 102],
# 'token_type_ids': [0, 0, 0, 0, 0, 0, 0],
# 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}
# Decoding back to text
text = tokenizer.decode(encoded["input_ids"])
# '[CLS] hello, hugging face! [SEP]'
# Batch encoding with padding and truncation
batch = tokenizer(
["Short text", "This is a much longer piece of text for comparison"],
padding=True, # Pad shorter sequences
truncation=True, # Truncate to max_length
max_length=512, # Maximum sequence length
return_tensors="pt" # Return PyTorch tensors ("tf" for TensorFlow)
)
# Important tokenizer concepts for the exam:
tokenizer_concepts = {
"input_ids": "Token indices in the vocabulary",
"attention_mask": "1 for real tokens, 0 for padding",
"token_type_ids": "Segment IDs for sentence pairs (BERT)",
"special_tokens": "[CLS], [SEP], [PAD], [UNK], [MASK]",
"subword_tokenization": "WordPiece (BERT), BPE (GPT), SentencePiece (T5)",
"padding": "Add [PAD] tokens to match batch length",
"truncation": "Cut sequences longer than max_length"
}
# Encoding sentence pairs (for NLI, QA, etc.)
pair_encoded = tokenizer("What is NLP?", "NLP is natural language processing.",
padding=True, truncation=True, return_tensors="pt")
AutoClasses
AutoClasses automatically detect the correct model architecture from a checkpoint name. This is the recommended way to load models because it works across all architectures.
# AutoClasses - Automatic model and tokenizer loading
from transformers import (
AutoTokenizer,
AutoModel,
AutoModelForSequenceClassification,
AutoModelForTokenClassification,
AutoModelForQuestionAnswering,
AutoModelForCausalLM,
AutoModelForSeq2SeqLM,
AutoConfig
)
# Load tokenizer and model for a specific task
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased", num_labels=2
)
# AutoConfig - inspect model configuration
config = AutoConfig.from_pretrained("bert-base-uncased")
print(config.hidden_size) # 768
print(config.num_attention_heads) # 12
print(config.num_hidden_layers) # 12
# Key AutoClasses for the exam:
auto_classes = {
"AutoModel": "Base model (hidden states output, no task head)",
"AutoModelForSequenceClassification": "Classification head on top",
"AutoModelForTokenClassification": "Per-token classification (NER)",
"AutoModelForQuestionAnswering": "Start/end logits for extractive QA",
"AutoModelForCausalLM": "Next-token prediction (GPT-style)",
"AutoModelForSeq2SeqLM": "Encoder-decoder (T5, BART)",
"AutoModelForMaskedLM": "Fill-mask prediction (BERT-style)",
"AutoTokenizer": "Correct tokenizer for any model",
"AutoConfig": "Model configuration (hyperparameters)"
}
# Model inference (manual, without pipeline)
import torch
inputs = tokenizer("I love NLP!", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # Raw predictions
predictions = torch.argmax(logits, dim=-1) # Class index
probabilities = torch.softmax(logits, dim=-1) # Probabilities
Model Architectures
Understanding the three main transformer architectures is essential for choosing the right model for each task.
Encoder Models
BERT, RoBERTa, DistilBERT, ALBERT. Best for understanding tasks: classification, NER, extractive QA. Process entire input at once with bidirectional attention.
Decoder Models
GPT-2, LLaMA, Mistral. Best for text generation. Process tokens left-to-right with causal (unidirectional) attention. Used for completion and chat.
Encoder-Decoder Models
T5, BART, mBART, MarianMT. Best for sequence-to-sequence tasks: translation, summarization, generative QA. Encoder reads input, decoder generates output.
Practice Questions
Q1: What does pipeline("text-classification") return by default?
Answer: A list of dictionaries, each containing a label (predicted class name) and a score (confidence probability). For example: [{'label': 'POSITIVE', 'score': 0.9998}]. The pipeline handles tokenization, model inference, and post-processing automatically.
Q2: What is the difference between AutoModel and AutoModelForSequenceClassification?
Answer: AutoModel loads the base transformer without any task-specific head — it outputs hidden states. AutoModelForSequenceClassification adds a classification head (linear layer) on top of the base model that outputs logits for each class. Use the task-specific variant when you need predictions, and the base model when you need embeddings.
Q3: What does the attention_mask do in tokenizer output?
Answer: The attention_mask is a binary tensor where 1 indicates real tokens and 0 indicates padding tokens. It tells the model which tokens to attend to and which to ignore. Without it, the model would treat padding tokens as meaningful input, degrading performance.
Q4: When should you use return_tensors="pt" vs return_tensors="tf"?
Answer: Use "pt" when your model is a PyTorch model (returns torch.Tensor) and "tf" when using a TensorFlow model (returns tf.Tensor). If you omit return_tensors, the tokenizer returns plain Python lists, which cannot be directly fed to a model.
Q5: Which model architecture is best for text classification: encoder, decoder, or encoder-decoder?
Answer: Encoder models (BERT, RoBERTa, DistilBERT) are best for text classification. They use bidirectional attention to understand the full context of the input, making them ideal for understanding tasks. Decoder models can also classify text but are less efficient for this purpose. Encoder-decoder models are overkill for classification — they are designed for sequence-to-sequence tasks.
Key Takeaways
- The Pipeline API is the fastest way to run inference — know all 10+ task types
- Tokenizers handle encoding, padding, truncation, and special tokens
- Always use AutoClasses to load models — they detect the correct architecture automatically
- Encoder models (BERT) are for understanding tasks, decoder models (GPT) for generation, encoder-decoder (T5) for seq2seq
- Model outputs contain logits — use softmax for probabilities or argmax for class predictions