Overfitting and Underfitting Detection
Identifying and diagnosing model fitting problems. Part of the AI Model Testing Fundamentals course at AI School by Lilly Tech Systems.
Understanding the Bias-Variance Tradeoff
Every machine learning model faces a fundamental tension between bias and variance. Bias measures how far off the model's average prediction is from the true value. Variance measures how much the model's predictions change when trained on different subsets of data. Overfitting occurs when variance is high (the model memorizes training data). Underfitting occurs when bias is high (the model is too simple to capture patterns).
Detecting and addressing these issues is one of the most important aspects of AI testing. A model that overfits will show excellent training metrics but poor production performance. A model that underfits will show poor metrics everywhere. Both problems need different solutions.
Detecting Overfitting
Overfitting is the most common problem in machine learning. The model learns the training data's noise and peculiarities instead of the underlying patterns. Here are the key signs:
- Large train-test gap — Training accuracy is significantly higher than test accuracy (e.g., 99% train vs 75% test)
- Performance degrades on new data — The model does well on validation but poorly on truly unseen data
- High variance across CV folds — Performance varies wildly between different cross-validation folds
- Complex decision boundaries — The model creates unnecessarily complex rules to fit training noise
from sklearn.model_selection import learning_curve
import matplotlib.pyplot as plt
import numpy as np
def plot_learning_curve(estimator, X, y, title="Learning Curve"):
train_sizes, train_scores, val_scores = learning_curve(
estimator, X, y, cv=5, n_jobs=-1,
train_sizes=np.linspace(0.1, 1.0, 10),
scoring='accuracy'
)
train_mean = train_scores.mean(axis=1)
train_std = train_scores.std(axis=1)
val_mean = val_scores.mean(axis=1)
val_std = val_scores.std(axis=1)
plt.figure(figsize=(10, 6))
plt.plot(train_sizes, train_mean, label='Training score')
plt.plot(train_sizes, val_mean, label='Validation score')
plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1)
plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1)
plt.xlabel('Training Set Size')
plt.ylabel('Accuracy')
plt.title(title)
plt.legend()
plt.grid(True)
plt.show()
# Overfitting indicator
gap = train_mean[-1] - val_mean[-1]
print(f"Train-Val gap: {gap:.4f}")
if gap > 0.1:
print("WARNING: Significant overfitting detected")
Detecting Underfitting
Underfitting occurs when the model is too simple to capture the underlying patterns in the data. Signs include:
- Low training accuracy — The model cannot even fit the training data well
- Small train-test gap — Both training and test performance are poor
- High bias — The model consistently misses the same types of patterns
- Residual patterns — For regression, residuals show clear systematic patterns instead of random scatter
Automated Detection in Your Test Suite
You can build automated overfitting and underfitting detection directly into your CI/CD pipeline:
def test_no_overfitting(model, X_train, y_train, X_test, y_test, max_gap=0.10):
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
gap = train_score - test_score
assert gap <= max_gap, (
f"Overfitting detected: train={train_score:.4f}, "
f"test={test_score:.4f}, gap={gap:.4f} > {max_gap}"
)
def test_no_underfitting(model, X_test, y_test, min_score=0.70):
test_score = model.score(X_test, y_test)
assert test_score >= min_score, (
f"Underfitting detected: test score={test_score:.4f} < {min_score}"
)
Remedies for Overfitting
When your tests detect overfitting, apply these strategies in order of simplicity:
- Get more training data — The most effective solution when feasible
- Add regularization — L1, L2, dropout, early stopping
- Reduce model complexity — Fewer layers, smaller trees, lower polynomial degree
- Feature selection — Remove noisy or irrelevant features
- Data augmentation — Create synthetic training examples
- Ensemble methods — Combine multiple models to reduce variance
Remedies for Underfitting
When tests reveal underfitting, try these approaches:
- Increase model complexity — More layers, more trees, higher polynomial degree
- Add more features — Engineer new features that capture important patterns
- Reduce regularization — If regularization is too strong, the model cannot learn
- Train longer — More epochs or iterations may help
- Try different architectures — The current model type may not be suitable for your data
Lilly Tech Systems