Threat Landscape for AI

Lesson 2 of 7 in the AI Security Fundamentals course.

Understanding the AI Threat Landscape

The threat landscape for AI systems is broad and rapidly evolving. Unlike traditional software where vulnerabilities typically exist in code, AI systems are vulnerable at every layer — from the data they learn from to the models they produce to the infrastructure they run on. Understanding this landscape is the first step toward effective defense.

Threats to AI systems can be categorized by their target, the attacker's knowledge, the attack timing, and the attacker's goals. This multi-dimensional view helps security teams prioritize defenses and allocate resources effectively.

Threat Actors and Motivations

Different adversaries target AI systems for different reasons:

Nation-state actors: Seek to compromise AI systems for espionage, sabotage, or strategic advantage. They have significant resources and patience
Cybercriminals: Target AI systems for financial gain through model theft, data exfiltration, or ransomware attacks on ML infrastructure
Competitors: May attempt to steal proprietary models or training data to gain competitive advantage
Hacktivists: Aim to expose biases, demonstrate vulnerabilities, or disrupt AI systems they view as harmful
Insiders: Employees or contractors with legitimate access who misuse it, either intentionally or through negligence
Researchers: Security researchers who discover and may publicly disclose vulnerabilities, sometimes before patches are available

Categories of AI Threats

1. Data-Level Threats

Data is the foundation of every ML system, making it a prime target:

Data poisoning: Injecting malicious samples into training data to cause the model to learn incorrect patterns or embed backdoors
Data exfiltration: Stealing training data, which may contain sensitive personal information, trade secrets, or proprietary datasets
Label manipulation: Subtly changing labels in training data to shift model behavior in attacker-controlled directions
Data pipeline compromise: Attacking the ETL processes, data lakes, or annotation workflows that feed ML training

2. Model-Level Threats

The model itself presents multiple attack surfaces:

Adversarial examples: Crafted inputs that cause misclassification while appearing normal to humans
Model extraction: Querying a model systematically to reconstruct a functionally equivalent copy
Model inversion: Using model outputs to reconstruct sensitive training data
Backdoor attacks: Embedding hidden triggers in a model that activate specific malicious behavior

💡

Key insight: Model-level threats are particularly dangerous because they can be difficult to detect. A backdoored model may perform perfectly on standard test sets while containing hidden malicious functionality.

3. Infrastructure-Level Threats

The compute infrastructure that supports AI systems is also vulnerable:

GPU cluster compromise: Attackers gaining access to training infrastructure can modify models, steal data, or mine cryptocurrency
Container escape: Breaking out of ML serving containers to access the underlying host or other containers
Supply chain attacks: Compromising ML frameworks, libraries, or pre-trained models that organizations depend on
API abuse: Exploiting model serving APIs for data extraction, denial of service, or unauthorized inference

Attack Taxonomy by Knowledge Level

Understanding how much an attacker knows about the target system helps assess risk:

Python

# Attack classification by attacker knowledge level
ATTACK_TAXONOMY = {
    "white_box": {
        "description": "Attacker has full access to model architecture, weights, and training data",
        "attacks": ["FGSM", "PGD", "C&W Attack", "DeepFool"],
        "difficulty": "Easiest to execute, hardest to defend",
        "real_world": "Insider threat, leaked models, open-source models"
    },
    "black_box": {
        "description": "Attacker can only query the model and observe outputs",
        "attacks": ["Transfer attacks", "Query-based attacks", "Model extraction"],
        "difficulty": "Harder to execute, but still very effective",
        "real_world": "Most common in production API attacks"
    },
    "gray_box": {
        "description": "Attacker has partial knowledge (e.g., model architecture but not weights)",
        "attacks": ["Targeted transfer attacks", "Architecture-aware querying"],
        "difficulty": "Moderate difficulty, increasingly common",
        "real_world": "Attacker knows the model type from documentation or inference"
    }
}

for level, info in ATTACK_TAXONOMY.items():
    print(f"\n{level.upper()}: {info['description']}")
    print(f"  Attacks: {', '.join(info['attacks'])}")
    print(f"  Real-world scenario: {info['real_world']}")

Emerging Threats in 2025-2026

The AI threat landscape continues to evolve with new attack techniques:

Prompt injection at scale: Automated tools that discover and exploit prompt injection vulnerabilities across thousands of LLM-powered applications
Model supply chain attacks: Trojaned models uploaded to public repositories like Hugging Face that activate malicious behavior under specific conditions
AI-powered attacks on AI: Using one AI system to automatically discover and exploit vulnerabilities in another
Deepfake-enabled social engineering: Using AI-generated audio and video to bypass identity verification systems
Federated learning poisoning: Compromising individual nodes in federated learning systems to poison the global model

⚠

Warning: The barrier to entry for AI attacks is dropping rapidly. Open-source tools like ART (Adversarial Robustness Toolbox) and TextAttack make it easy for even novice attackers to generate adversarial examples and test attack techniques.

Building Threat Intelligence for AI

Organizations should establish AI-specific threat intelligence practices:

Monitor academic research on adversarial ML for new attack techniques (arXiv, NeurIPS, ICML, USENIX Security)
Track CVEs and security advisories for ML frameworks (TensorFlow, PyTorch, scikit-learn)
Participate in AI security communities and information sharing groups
Conduct regular threat modeling exercises specifically for AI systems
Maintain an internal knowledge base of AI-specific threats relevant to your deployment

Next Steps

Now that you understand the breadth of the AI threat landscape, the next lesson covers the core security principles that guide how we defend ML systems against these threats. You will learn how to apply established security concepts to the unique challenges of AI.

← PreviousIntroduction to AI Security Next →Security Principles for ML