NLP with FastAI
Build text classifiers using FastAI's ULMFiT approach — fine-tune a language model on your domain text, then use it for classification with remarkably little data.
The ULMFiT Approach
ULMFiT (Universal Language Model Fine-tuning) was introduced by Jeremy Howard and Sebastian Ruder in 2018. It pioneered transfer learning for NLP — the same idea that made image classification accessible. The approach has three stages:
Pre-trained Language Model
Start with a language model pre-trained on a large corpus (like Wikipedia). This model understands general English grammar, vocabulary, and semantics.
Fine-tune the Language Model
Fine-tune the language model on your specific domain text (e.g., movie reviews, legal documents). This teaches it your domain's vocabulary and style.
Train the Classifier
Add a classification head and fine-tune the entire model for your specific task (e.g., sentiment analysis, topic classification).
Text Classification in FastAI
from fastai.text.all import * # Load IMDB dataset path = untar_data(URLs.IMDB) # Step 1: Create language model DataLoaders dls_lm = TextDataLoaders.from_folder( path, is_lm=True, valid='test' ) # Step 2: Fine-tune language model on domain text learn_lm = language_model_learner( dls_lm, AWD_LSTM, drop_mult=0.3, metrics=[accuracy, Perplexity()] ) learn_lm.fine_tune(4, 1e-2) # Save the fine-tuned encoder learn_lm.save_encoder('finetuned_encoder')
# Step 3: Create classifier DataLoaders dls_clas = TextDataLoaders.from_folder( path, valid='test', text_vocab=dls_lm.vocab ) # Step 4: Build classifier using the fine-tuned encoder learn_clas = text_classifier_learner( dls_clas, AWD_LSTM, drop_mult=0.5, metrics=accuracy ) learn_clas.load_encoder('finetuned_encoder') # Step 5: Train the classifier learn_clas.fine_tune(4, 1e-2) # Make predictions learn_clas.predict("This movie was absolutely fantastic!") # ('pos', tensor(1), tensor([0.0083, 0.9917]))
Quick Text Classification (Skip LM Fine-tuning)
If you want faster results and have enough labeled data, you can skip the language model fine-tuning step:
from fastai.text.all import * # Direct classification (uses pre-trained LM encoder) dls = TextDataLoaders.from_df( df, text_col='text', label_col='label', valid_pct=0.2 ) learn = text_classifier_learner(dls, AWD_LSTM, metrics=accuracy) learn.fine_tune(4)
Text Generation
# Generate text from the fine-tuned language model TEXT = "The movie started with" print(learn_lm.predict(TEXT, n_words=40, temperature=0.75))
blurr library.
Next Up: Best Practices
Learn the learning rate finder, mixed precision training, callbacks, and deployment strategies.
Next: Best Practices →
Lilly Tech Systems