Pretrained Language Models Intermediate

A comprehensive directory of pretrained language models for text generation, classification, named entity recognition, translation, summarization, and embeddings — with practical code examples.

Text Generation

GPT-2

OpenAI's autoregressive language model (124M to 1.5B parameters). Fully open-source and a great starting point for text generation experiments.

Python

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
result = generator("The key to machine learning is", max_length=100, num_return_sequences=1)
print(result[0]["generated_text"])

Llama (Meta)

Meta's open-weight LLM family. Llama 3.1 (8B, 70B, 405B) and Llama 3.2 offer strong performance across tasks. Available on Hugging Face with a community license.

Mistral

Mistral 7B and Mixtral 8x7B from Mistral AI. Excellent performance-to-size ratio with sliding window attention and sparse mixture of experts.

Phi (Microsoft)

Small language models (1.3B-14B) optimized for reasoning tasks. Phi-3 achieves surprising quality at small sizes, ideal for edge deployment.

Text Classification

BERT

Google's bidirectional encoder model. The foundation for most text classification, question answering, and NER tasks.

Python

from transformers import pipeline

# Sentiment analysis with a fine-tuned BERT
classifier = pipeline("sentiment-analysis")
result = classifier("This product exceeded my expectations!")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]

RoBERTa

An optimized version of BERT with better training procedures. Generally outperforms BERT on benchmarks.

DistilBERT

A distilled version of BERT that is 60% faster and 40% smaller while retaining 97% of BERT's performance. Ideal for production deployments.

Named Entity Recognition (NER)

Python

from transformers import pipeline

ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
result = ner("Elon Musk founded SpaceX in Hawthorne, California.")
for entity in result:
    print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.2f})")
# Elon Musk: PER (0.99)
# SpaceX: ORG (0.98)
# Hawthorne: LOC (0.99)
# California: LOC (0.99)

Translation

MarianMT

Fast, lightweight translation models. Over 1,000 language pair models available on Hugging Face.

Python

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Machine learning is transforming the world.")
print(result[0]["translation_text"])
# L'apprentissage automatique transforme le monde.

M2M-100 & NLLB

Meta's many-to-many translation models. NLLB (No Language Left Behind) supports 200+ languages.

Summarization

Python

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """Long article text goes here..."""
summary = summarizer(text, max_length=130, min_length=30)
print(summary[0]["summary_text"])

Other summarization models: T5 (Google, text-to-text), Pegasus (Google, abstractive summarization).

Embeddings

Python

from sentence_transformers import SentenceTransformer

# Load a sentence embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = ["Machine learning is great", "I love AI", "The weather is nice"]
embeddings = model.encode(sentences)

# Compute similarity
from sentence_transformers import util
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")
# Similarity: 0.6521

Popular embedding models: E5 (Microsoft), BGE (BAAI), GTE (Alibaba), Cohere Embed.

Language Models Comparison

Model	Task	Size	Best For
GPT-2	Generation	124M-1.5B	Learning, experimentation
Llama 3.1	Generation	8B-405B	Production text generation
BERT	Classification, NER	110M-340M	Understanding tasks
DistilBERT	Classification	66M	Fast production inference
T5	Any text-to-text	60M-11B	Versatile NLP
all-MiniLM-L6	Embeddings	22M	Semantic search, RAG

Next Up

Explore pretrained audio models for speech recognition, text-to-speech, and music generation.

Next: Audio Models →

← Vision Models Audio Models →