Intermediate
Tabular Data with FastAI
Apply deep learning to structured/tabular data using FastAI's TabularDataLoaders. Learn how to handle categorical and continuous variables, apply preprocessing, and train models that compete with gradient boosting.
TabularDataLoaders
FastAI makes it easy to work with CSV and DataFrame data for classification and regression tasks:
Python
from fastai.tabular.all import * # Load the Adult Income dataset path = untar_data(URLs.ADULT_SAMPLE) df = pd.read_csv(path/'adult.csv') # Define column types cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country'] cont_names = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week'] # Create DataLoaders with preprocessing dls = TabularDataLoaders.from_df( df, path=path, procs=[Categorify, FillMissing, Normalize], cat_names=cat_names, cont_names=cont_names, y_names='salary', y_block=CategoryBlock, valid_idx=list(range(800, 1000)), bs=64 )
Categorical vs Continuous Variables
| Type | Examples | How FastAI Handles It |
|---|---|---|
| Categorical | Color, country, product type | Learned embeddings (like word embeddings but for categories) |
| Continuous | Age, price, temperature | Normalized to mean=0, std=1 |
Entity Embeddings: FastAI uses learned embeddings for categorical variables. This technique, introduced in the paper "Entity Embeddings of Categorical Variables," allows the model to discover meaningful representations — for example, learning that Monday and Tuesday are similar but Saturday is different.
Preprocessing Transforms
Python
# Built-in preprocessing transforms procs = [ Categorify, # Convert categories to integer codes FillMissing, # Fill missing values (adds indicator column) Normalize, # Normalize continuous columns ] # FillMissing creates a boolean column (e.g., age_na) # that tells the model when data was missing
Training a Tabular Model
Python
# Create learner learn = tabular_learner( dls, layers=[200, 100], # Hidden layer sizes metrics=accuracy ) # Find optimal learning rate learn.lr_find() # Train learn.fit_one_cycle(5, 1e-2) # Make predictions row, clas, probs = learn.predict(df.iloc[0]) print(f"Prediction: {clas}, Probabilities: {probs}")
Feature Engineering Tips
- Date features — Use
add_datepart(df, 'date_column')to automatically extract year, month, day, day-of-week, etc. - High-cardinality categories — FastAI handles these well with embeddings. No need to one-hot encode.
- Missing values —
FillMissingcreates indicator columns, letting the model learn when data is missing. - Embedding sizes — FastAI automatically chooses embedding dimensions based on cardinality, or you can specify
emb_szs.
Next Up: NLP
Learn how to apply FastAI to text classification and language model fine-tuning.
Next: NLP →
Lilly Tech Systems