Attack Surface Analysis

Lesson 4 of 7 in the AI Security Fundamentals course.

Mapping the AI Attack Surface

An attack surface analysis identifies all the points where an attacker could interact with or influence an AI system. For machine learning systems, the attack surface is significantly larger than traditional applications because it spans data, code, models, infrastructure, and human processes.

A thorough attack surface analysis is the foundation for prioritizing security investments and designing effective defenses. Without it, organizations tend to over-invest in visible threats (like API security) while neglecting less obvious but equally dangerous vectors (like training data integrity).

The ML Lifecycle Attack Surface

Every phase of the ML lifecycle presents distinct attack opportunities:

1. Data Collection and Preparation

Web scraping pipelines: Attackers can manipulate web content that gets scraped into training datasets
Third-party data feeds: Compromised data providers can inject poisoned samples
Annotation platforms: Malicious annotators can systematically mislabel data
Data storage: Unauthorized access to data lakes, S3 buckets, or databases containing training data
Data preprocessing: Compromised ETL code can modify data before it reaches training

2. Model Training

Training code: Malicious modifications to training scripts, hyperparameters, or loss functions
Dependencies: Compromised ML libraries (PyTorch, TensorFlow, scikit-learn) or their dependencies
Compute environment: Shared GPU clusters where other tenants could access training processes
Pre-trained models: Backdoored foundation models or transfer learning base models
Experiment tracking: Manipulated metrics in MLflow or Weights & Biases to promote a compromised model

💡

Practical tip: Create a data flow diagram for your entire ML pipeline. Trace every piece of data from its source to the final model prediction. Every point where data crosses a trust boundary is a potential attack surface.

3. Model Evaluation and Selection

Test data contamination: Training data leaking into test sets, masking model issues
Metric manipulation: Cherry-picking evaluation metrics that hide vulnerabilities
Model registry: Unauthorized model promotion from staging to production
A/B testing: Manipulating experiment results to deploy a compromised model variant

4. Model Deployment and Serving

Model artifacts: Tampered model files loaded into serving infrastructure
API endpoints: Unauthenticated or poorly rate-limited prediction APIs
Feature stores: Manipulated real-time features that influence model predictions
Edge deployment: Models on devices that can be physically accessed and reverse-engineered

Automated Attack Surface Enumeration

Use systematic approaches to enumerate your attack surface:

Python

class MLAttackSurfaceAnalyzer:
    """Enumerate and score attack surfaces for ML systems."""

    PHASES = {
        "data_collection": {
            "vectors": [
                ("Web scraping sources", "Data poisoning via manipulated web content", "HIGH"),
                ("Third-party data APIs", "Compromised data feed injection", "HIGH"),
                ("User-submitted data", "Adversarial training samples", "MEDIUM"),
                ("Internal databases", "Insider data manipulation", "MEDIUM"),
            ]
        },
        "training": {
            "vectors": [
                ("ML framework dependencies", "Supply chain attack via compromised package", "CRITICAL"),
                ("Pre-trained model downloads", "Backdoored foundation model", "HIGH"),
                ("Shared GPU cluster", "Cross-tenant data leakage", "HIGH"),
                ("Training scripts in git", "Malicious code injection", "MEDIUM"),
            ]
        },
        "serving": {
            "vectors": [
                ("Prediction API", "Adversarial examples / model extraction", "HIGH"),
                ("Model file storage", "Model tampering or theft", "HIGH"),
                ("Feature pipeline", "Real-time feature manipulation", "MEDIUM"),
                ("Monitoring endpoints", "Information disclosure", "LOW"),
            ]
        }
    }

    def analyze(self):
        """Generate a complete attack surface report."""
        report = []
        for phase, data in self.PHASES.items():
            for vector, threat, severity in data["vectors"]:
                report.append({
                    "phase": phase,
                    "vector": vector,
                    "threat": threat,
                    "severity": severity
                })
        return sorted(report, key=lambda x: {"CRITICAL": 0, "HIGH": 1, "MEDIUM": 2, "LOW": 3}[x["severity"]])

analyzer = MLAttackSurfaceAnalyzer()
for item in analyzer.analyze():
    print(f"[{item['severity']:8s}] {item['phase']:20s} | {item['vector']}")
    print(f"           Threat: {item['threat']}")

Trust Boundaries in ML Systems

Trust boundaries are the lines between components with different levels of trust. Data crossing these boundaries must be validated:

External to internal: User inputs, third-party data, downloaded models
Training to serving: Model artifacts, configuration files, feature schemas
Data team to ML team: Datasets, labels, data quality reports
Development to production: Code, models, infrastructure configurations
Internal services: Feature store to model server, model server to monitoring

⚠

Warning: One of the most commonly overlooked trust boundaries is between pre-trained models and your system. Downloading a model from Hugging Face or TensorFlow Hub is equivalent to running someone else's code. Always verify model integrity and test for backdoors.

Reducing the Attack Surface

After mapping the attack surface, prioritize reducing it:

Minimize data exposure: Only collect and retain the data you actually need for training
Pin dependencies: Lock all ML library versions and verify checksums
Isolate environments: Use separate networks for training, evaluation, and serving
Limit API capabilities: Expose only prediction endpoints, not model internals
Encrypt everywhere: Data at rest, in transit, and during computation where possible

Summary

A comprehensive attack surface analysis reveals the full scope of security challenges in ML systems. By systematically mapping attack vectors across every lifecycle phase and identifying trust boundaries, you can prioritize defenses where they matter most. The next lesson covers defense in depth strategies for protecting each layer of this attack surface.

← PreviousSecurity Principles for ML Next →Defense in Depth for AI