Intermediate

Models & Tokenizers

Go beyond pipelines. Learn to work directly with AutoModel, AutoTokenizer, and understand the architectures behind transformer models.

AutoModel & AutoTokenizer

The Auto classes automatically detect and load the correct model architecture based on the checkpoint name:

Python

from transformers import AutoTokenizer, AutoModel

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# Tokenize input text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
print(inputs.keys())
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask'])

# Run through model
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)
# torch.Size([1, 7, 768])

Task-Specific Auto Classes

For specific tasks, use the appropriate Auto class that adds the correct head on top of the base model:

Python

from transformers import (
    AutoModelForSequenceClassification,
    AutoModelForTokenClassification,
    AutoModelForQuestionAnswering,
    AutoModelForCausalLM,
    AutoModelForSeq2SeqLM,
    AutoModelForImageClassification,
)

# Classification model (e.g., sentiment analysis)
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english"
)

# Text generation model
model = AutoModelForCausalLM.from_pretrained("gpt2")

Understanding Tokenizers

Tokenizers convert raw text into numerical tokens that models can process. Different models use different tokenization strategies:

Python

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Basic tokenization
tokens = tokenizer.tokenize("Transformers are amazing!")
print(tokens)
# ['transformers', 'are', 'amazing', '!']

# Full encoding with special tokens
encoded = tokenizer("Transformers are amazing!", return_tensors="pt")
print(encoded['input_ids'])
# tensor([[ 101, 19081, 2024, 6429,  999,  102]])

# Decode back to text
decoded = tokenizer.decode(encoded['input_ids'][0])
print(decoded)
# '[CLS] transformers are amazing! [SEP]'

Model Architectures

Transformer models fall into three main architecture categories:

Encoder-only (BERT, RoBERTa, DistilBERT): Best for understanding tasks — classification, NER, question answering. They process the entire input at once with bidirectional attention.
Decoder-only (GPT-2, LLaMA, Mistral): Best for generation tasks — text completion, code generation. They generate tokens one at a time, left to right.
Encoder-Decoder (T5, BART, mBART): Best for sequence-to-sequence tasks — translation, summarization. The encoder processes input, and the decoder generates output.

💡

How to choose: Need to classify or extract? Use an encoder model. Need to generate text? Use a decoder model. Need to transform text (translate, summarize)? Use an encoder-decoder model.

Working with Model Outputs

Python

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

inputs = tokenizer("I really enjoyed this course!", return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# Raw logits
logits = outputs.logits
# Convert to probabilities
probs = torch.softmax(logits, dim=-1)
print(f"Negative: {probs[0][0]:.4f}, Positive: {probs[0][1]:.4f}")

What's Next?

Now that you understand models and tokenizers, the next lesson covers fine-tuning — how to train pre-trained models on your own data using the Trainer API.

← Previous Pipeline API Next → Fine-tuning