AI Blue Team Defense Intermediate

While red teams find vulnerabilities, blue teams build the detection and response capabilities that protect AI systems in production. This lesson covers building AI-specific monitoring systems, detecting adversarial inputs and model theft attempts, responding to AI security incidents, and integrating AI security monitoring with existing SIEM and SOC workflows.

AI Security Monitoring Architecture

An effective AI blue team monitoring system should cover multiple layers:

Monitoring Layer	What to Monitor	Detection Goal
Input Layer	API requests, input distributions, anomalous patterns	Adversarial inputs, injection attempts, unusual queries
Model Layer	Prediction distributions, confidence scores, latency	Model drift, degradation, manipulation
Output Layer	Generated content, response patterns, error rates	Jailbreak success, data leakage, policy violations
Data Layer	Training data integrity, feature store changes	Data poisoning, unauthorized modifications
Infrastructure Layer	Access logs, resource usage, network traffic	Unauthorized access, resource abuse, lateral movement

Adversarial Input Detection

Detecting adversarial inputs in real time is one of the most challenging blue team tasks:

Python

import numpy as np
from scipy import stats

class AdversarialDetector:
    """Detect potential adversarial inputs using statistical methods."""

    def __init__(self, reference_distribution):
        self.ref_mean = np.mean(reference_distribution, axis=0)
        self.ref_std = np.std(reference_distribution, axis=0)
        self.threshold = 3.0

    def detect(self, input_data):
        """Check if input deviates from expected distribution."""
        z_scores = np.abs((input_data - self.ref_mean) / self.ref_std)
        max_z = np.max(z_scores)

        if max_z > self.threshold:
            return {
                "suspicious": True,
                "max_deviation": float(max_z),
                "action": "flag_for_review"
            }
        return {"suspicious": False}

Model Extraction Detection

Detect model theft attempts by analyzing query patterns:

Query volume anomalies — Sudden spikes in API queries from a single user or IP
Systematic probing — Queries that systematically explore the input space (grid patterns, boundary probing)
Distribution analysis — Query inputs that follow synthetic distributions rather than natural data patterns
Timing patterns — Automated queries with regular intervals typical of extraction scripts

LLM Output Monitoring

For LLM-based systems, monitor outputs for security-relevant patterns:

Policy violation detection — Scan outputs for content that violates safety policies
Data leakage detection — Check for patterns matching PII, API keys, or training data in responses
Instruction leakage — Detect when the model reveals its system prompt or internal instructions
Anomalous behavior — Flag responses that deviate significantly from expected patterns

AI Incident Response Playbook

Incident Response Playbook

PLAYBOOK: AI Model Under Attack

DETECTION:
  Alert triggers: anomalous query patterns, accuracy drop,
  adversarial input detection, output policy violations

TRIAGE (0-15 min):
  1. Assess alert severity and scope
  2. Identify affected model(s) and endpoints
  3. Determine attack type (evasion, extraction, poisoning)
  4. Escalate to AI security team if confirmed

CONTAINMENT (15-60 min):
  1. Enable enhanced logging on affected endpoints
  2. Tighten rate limits if extraction is suspected
  3. Enable human-in-the-loop review for critical predictions
  4. Consider rollback to last known-good model version

INVESTIGATION (1-24 hours):
  1. Analyze attack inputs and patterns
  2. Assess model integrity (has it been degraded?)
  3. Check training data pipeline for poisoning
  4. Determine scope of data exposure or model leakage

RECOVERY:
  1. Deploy patched model with improved defenses
  2. Update detection rules based on attack patterns
  3. Restore normal operations with enhanced monitoring
  4. Document lessons learned and update threat model

SIEM Integration: Forward AI security events to your existing SIEM platform. Create custom detection rules for AI-specific attack patterns alongside traditional security monitoring. This gives the SOC team visibility into AI threats within their existing workflow.

Ready for Purple Teaming?

The next lesson covers how to combine red and blue team operations for maximum security improvement through collaborative purple teaming.

Next: Purple Teaming →

← Red Team Operations Purple Teaming →