Robustness Metrics
Learn the quantitative measures used to evaluate model robustness, from accuracy under perturbation to certified robustness guarantees and standardized benchmark suites.
Why Metrics Matter
Without standardized metrics, robustness claims remain subjective. Metrics allow you to compare models objectively, set measurable requirements, and track improvement over time. The key is choosing the right metric for your specific threat model and deployment context.
Core Robustness Metrics
| Metric | What It Measures | When to Use |
|---|---|---|
| Accuracy Under Perturbation | Model accuracy when inputs are perturbed within an epsilon ball | General adversarial robustness evaluation |
| Certified Robustness Radius | Maximum perturbation size guaranteed to not change the prediction | Safety-critical applications requiring formal guarantees |
| Mean Corruption Error (mCE) | Average error rate across a set of common corruptions | Evaluating robustness to natural image corruptions |
| Attack Success Rate | Percentage of inputs for which an attack successfully changes the output | Evaluating vulnerability to specific attack methods |
| Robustness Gap | Difference between clean accuracy and adversarial accuracy | Understanding the cost of robustness on clean performance |
Accuracy Under Perturbation
The most intuitive robustness metric measures how model accuracy degrades as inputs are perturbed. For a given perturbation budget epsilon, adversarial accuracy is the fraction of test examples that remain correctly classified after worst-case perturbation.
from art.attacks.evasion import FastGradientMethod from art.estimators.classification import PyTorchClassifier def measure_adversarial_accuracy(model, x_test, y_test, epsilons): """Measure accuracy across different perturbation budgets.""" classifier = PyTorchClassifier(model=model, ...) results = {} for eps in epsilons: attack = FastGradientMethod( estimator=classifier, eps=eps ) x_adv = attack.generate(x=x_test) predictions = classifier.predict(x_adv) accuracy = np.mean( np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1) ) results[eps] = accuracy print(f"Epsilon: {eps:.3f} | Accuracy: {accuracy:.2%}") return results # Example output: # Epsilon: 0.000 | Accuracy: 95.20% # Epsilon: 0.010 | Accuracy: 82.40% # Epsilon: 0.030 | Accuracy: 61.10% # Epsilon: 0.100 | Accuracy: 23.50%
Certified Robustness
Unlike empirical robustness (testing against specific attacks), certified robustness provides mathematical guarantees. Randomized smoothing is the most practical certified defense, creating a smoothed classifier that is provably robust within a certified radius.
Benchmark Suites
Standardized benchmarks enable fair comparison across models and defenses:
RobustBench
The most comprehensive leaderboard for adversarial robustness. Evaluates models on CIFAR-10, CIFAR-100, and ImageNet under AutoAttack with standardized threat models (Linf, L2).
ImageNet-C
15 types of corruption (noise, blur, weather, digital) at 5 severity levels. Measures mean Corruption Error (mCE) normalized against AlexNet performance.
GLUE-X / AdvGLUE
Robustness benchmarks for NLP models. Tests against textual adversarial attacks, paraphrases, and out-of-distribution examples across multiple tasks.
Measuring Robustness in NLP
Text-based models require different robustness metrics than vision models. Key measures include:
- Word Error Rate Under Attack: How many word substitutions cause output to change.
- Semantic Similarity Preservation: Whether adversarial text maintains its meaning while changing model output.
- Invariance Score: Consistency of predictions across paraphrases of the same input.
- Out-of-Vocabulary Robustness: Performance when encountering unseen words, typos, or slang.
Lilly Tech Systems