Pretrained Language Models Intermediate
A comprehensive directory of pretrained language models for text generation, classification, named entity recognition, translation, summarization, and embeddings — with practical code examples.
Text Generation
GPT-2
OpenAI's autoregressive language model (124M to 1.5B parameters). Fully open-source and a great starting point for text generation experiments.
from transformers import pipeline generator = pipeline("text-generation", model="gpt2") result = generator("The key to machine learning is", max_length=100, num_return_sequences=1) print(result[0]["generated_text"])
Llama (Meta)
Meta's open-weight LLM family. Llama 3.1 (8B, 70B, 405B) and Llama 3.2 offer strong performance across tasks. Available on Hugging Face with a community license.
Mistral
Mistral 7B and Mixtral 8x7B from Mistral AI. Excellent performance-to-size ratio with sliding window attention and sparse mixture of experts.
Phi (Microsoft)
Small language models (1.3B-14B) optimized for reasoning tasks. Phi-3 achieves surprising quality at small sizes, ideal for edge deployment.
Text Classification
BERT
Google's bidirectional encoder model. The foundation for most text classification, question answering, and NER tasks.
from transformers import pipeline # Sentiment analysis with a fine-tuned BERT classifier = pipeline("sentiment-analysis") result = classifier("This product exceeded my expectations!") print(result) # [{'label': 'POSITIVE', 'score': 0.9998}]
RoBERTa
An optimized version of BERT with better training procedures. Generally outperforms BERT on benchmarks.
DistilBERT
A distilled version of BERT that is 60% faster and 40% smaller while retaining 97% of BERT's performance. Ideal for production deployments.
Named Entity Recognition (NER)
from transformers import pipeline ner = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple") result = ner("Elon Musk founded SpaceX in Hawthorne, California.") for entity in result: print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.2f})") # Elon Musk: PER (0.99) # SpaceX: ORG (0.98) # Hawthorne: LOC (0.99) # California: LOC (0.99)
Translation
MarianMT
Fast, lightweight translation models. Over 1,000 language pair models available on Hugging Face.
from transformers import pipeline translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr") result = translator("Machine learning is transforming the world.") print(result[0]["translation_text"]) # L'apprentissage automatique transforme le monde.
M2M-100 & NLLB
Meta's many-to-many translation models. NLLB (No Language Left Behind) supports 200+ languages.
Summarization
from transformers import pipeline summarizer = pipeline("summarization", model="facebook/bart-large-cnn") text = """Long article text goes here...""" summary = summarizer(text, max_length=130, min_length=30) print(summary[0]["summary_text"])
Other summarization models: T5 (Google, text-to-text), Pegasus (Google, abstractive summarization).
Embeddings
from sentence_transformers import SentenceTransformer # Load a sentence embedding model model = SentenceTransformer("all-MiniLM-L6-v2") sentences = ["Machine learning is great", "I love AI", "The weather is nice"] embeddings = model.encode(sentences) # Compute similarity from sentence_transformers import util similarity = util.cos_sim(embeddings[0], embeddings[1]) print(f"Similarity: {similarity.item():.4f}") # Similarity: 0.6521
Popular embedding models: E5 (Microsoft), BGE (BAAI), GTE (Alibaba), Cohere Embed.
Language Models Comparison
| Model | Task | Size | Best For |
|---|---|---|---|
| GPT-2 | Generation | 124M-1.5B | Learning, experimentation |
| Llama 3.1 | Generation | 8B-405B | Production text generation |
| BERT | Classification, NER | 110M-340M | Understanding tasks |
| DistilBERT | Classification | 66M | Fast production inference |
| T5 | Any text-to-text | 60M-11B | Versatile NLP |
| all-MiniLM-L6 | Embeddings | 22M | Semantic search, RAG |
Next Up
Explore pretrained audio models for speech recognition, text-to-speech, and music generation.
Next: Audio Models →
Lilly Tech Systems