Intermediate

Overfitting and Underfitting Detection

Identifying and diagnosing model fitting problems. Part of the AI Model Testing Fundamentals course at AI School by Lilly Tech Systems.

Understanding the Bias-Variance Tradeoff

Every machine learning model faces a fundamental tension between bias and variance. Bias measures how far off the model's average prediction is from the true value. Variance measures how much the model's predictions change when trained on different subsets of data. Overfitting occurs when variance is high (the model memorizes training data). Underfitting occurs when bias is high (the model is too simple to capture patterns).

Detecting and addressing these issues is one of the most important aspects of AI testing. A model that overfits will show excellent training metrics but poor production performance. A model that underfits will show poor metrics everywhere. Both problems need different solutions.

Detecting Overfitting

Overfitting is the most common problem in machine learning. The model learns the training data's noise and peculiarities instead of the underlying patterns. Here are the key signs:

Large train-test gap — Training accuracy is significantly higher than test accuracy (e.g., 99% train vs 75% test)
Performance degrades on new data — The model does well on validation but poorly on truly unseen data
High variance across CV folds — Performance varies wildly between different cross-validation folds
Complex decision boundaries — The model creates unnecessarily complex rules to fit training noise

from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import numpy as np

def plot_learning_curve(estimator, X, y, title="Learning Curve"):
    train_sizes, train_scores, val_scores = learning_curve(
        estimator, X, y, cv=5, n_jobs=-1,
        train_sizes=np.linspace(0.1, 1.0, 10),
        scoring='accuracy'
    )

    train_mean = train_scores.mean(axis=1)
    train_std = train_scores.std(axis=1)
    val_mean = val_scores.mean(axis=1)
    val_std = val_scores.std(axis=1)

    plt.figure(figsize=(10, 6))
    plt.plot(train_sizes, train_mean, label='Training score')
    plt.plot(train_sizes, val_mean, label='Validation score')
    plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1)
    plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1)
    plt.xlabel('Training Set Size')
    plt.ylabel('Accuracy')
    plt.title(title)
    plt.legend()
    plt.grid(True)
    plt.show()

    # Overfitting indicator
    gap = train_mean[-1] - val_mean[-1]
    print(f"Train-Val gap: {gap:.4f}")
    if gap > 0.1:
        print("WARNING: Significant overfitting detected")

Detecting Underfitting

Underfitting occurs when the model is too simple to capture the underlying patterns in the data. Signs include:

Low training accuracy — The model cannot even fit the training data well
Small train-test gap — Both training and test performance are poor
High bias — The model consistently misses the same types of patterns
Residual patterns — For regression, residuals show clear systematic patterns instead of random scatter

💡

Diagnostic shortcut: Plot learning curves. If the training and validation curves converge at a low score, the model is underfitting. If there is a large gap between them, the model is overfitting. This single visualization tells you which direction to move.

Automated Detection in Your Test Suite

You can build automated overfitting and underfitting detection directly into your CI/CD pipeline:

def test_no_overfitting(model, X_train, y_train, X_test, y_test, max_gap=0.10):
    train_score = model.score(X_train, y_train)
    test_score = model.score(X_test, y_test)
    gap = train_score - test_score
    assert gap <= max_gap, (
        f"Overfitting detected: train={train_score:.4f}, "
        f"test={test_score:.4f}, gap={gap:.4f} > {max_gap}"
    )

def test_no_underfitting(model, X_test, y_test, min_score=0.70):
    test_score = model.score(X_test, y_test)
    assert test_score >= min_score, (
        f"Underfitting detected: test score={test_score:.4f} < {min_score}"
    )

Remedies for Overfitting

When your tests detect overfitting, apply these strategies in order of simplicity:

Get more training data — The most effective solution when feasible
Add regularization — L1, L2, dropout, early stopping
Reduce model complexity — Fewer layers, smaller trees, lower polynomial degree
Feature selection — Remove noisy or irrelevant features
Data augmentation — Create synthetic training examples
Ensemble methods — Combine multiple models to reduce variance

Remedies for Underfitting

When tests reveal underfitting, try these approaches:

Increase model complexity — More layers, more trees, higher polynomial degree
Add more features — Engineer new features that capture important patterns
Reduce regularization — If regularization is too strong, the model cannot learn
Train longer — More epochs or iterations may help
Try different architectures — The current model type may not be suitable for your data

⚠

Important: Always fix underfitting before worrying about overfitting. If your model cannot learn the training data, adding regularization or reducing complexity will only make things worse. Get the model to fit the training data first, then address any overfitting.

← Previous Cross-Validation Techniques Next → Statistical Significance in Testing