AI Red Teaming

Master AI red teaming as a defensive discipline. 60 deep dives across 360 lessons covering foundations (RT vs blue / pen-test / audit, history, ethics, deliverables), program design (charter, recruitment, scoping, legal authorization & safe harbor, bug bounty, external vs internal), AI threat modeling (attack surface, threat actors, attack trees, kill chain, abuse cases, prioritisation), prompt-based attacks (direct & indirect injection, jailbreak taxonomy, encoding / obfuscation, multi-turn, universal suffixes, eval), model & system attacks (extraction, training-data extraction, inversion, membership inference, backdoors, supply chain), agent & tool-use attacks (agent hijacking, tool abuse, multi-agent, computer use, MCP, agent eval), multimodal attacks (vision, audio, document, deepfake, OCR / typography, multimodal eval), capability & safety evaluations (dangerous capability, CBRN uplift, cyber, persuasion, autonomy, deception, frontier suites), reporting & operations (finding lifecycle, severity, repro, responsible disclosure, vendor coordination, public disclosure), and tools, industry & future (Garak / PyRIT / Inspect, AISI & frontier patterns, NIST AI RMF / MITRE ATLAS / OWASP LLM Top 10, future of AI RT).

Start Learning View All Topics

60Topics

360Lessons

10Categories

100%Free

AI red teaming is the defensive discipline of probing AI systems for safety, security, and policy failures — and turning those findings into fixes, evals, and disclosures that make the next release better. It sits at the intersection of classical security red-teaming (rules of engagement, kill chains, attack trees, responsible disclosure), adversarial-ML research (model extraction, training-data extraction, membership inference, adversarial perturbations), prompt-engineering tradecraft (direct and indirect injection, jailbreak taxonomy, multi-turn priming), agent-tool evaluation (Computer Use, MCP, multi-agent settings), and the operational machinery that runs in production (severity rubrics, repro packets, fix tracking, vendor coordination, public disclosure). Over the last three years it has stopped being an academic side topic and become an operating commitment for every serious AI deployment: frontier labs publish red-team findings in system cards, governments stand up AI Safety Institutes that run pre-deployment evaluations, regulators write red-teaming duties into law, and enterprises require evidence in procurement.

This track is written for the practitioners doing this work day to day: AI red teamers, security researchers extending into AI, ML engineers building eval harnesses, T&S detection engineers, RAI leads writing safety evaluations, frontier-lab safety teams, AISI evaluators, and program leaders standing up red-team functions. Every topic explains the underlying discipline (drawing on the canonical literature — adversarial-ML research, MITRE ATLAS, OWASP LLM Top 10, NIST AI RMF, AISI publications, frontier-lab system cards), the practical methodology that operationalises it, the defensive implications, and the failure modes where red-team work quietly fails to change the product. Content is conceptual and methodological — it covers attack categories at the level of taxonomy and defence implications, not as step-by-step exploit recipes. The aim is that a reader can stand up a credible AI red-team function, integrate it with engineering and governance, and defend it to boards, regulators, and customers.

All Topics

60 AI red teaming topics organized into 10 categories. Each has 6 detailed lessons with frameworks, methodologies, and operational patterns.

Red Team Foundations

🤖

AI Red Teaming Overview

Master what AI red teaming actually is. Learn the scope, the lineage from security and intelligence red-teaming, the deliverables, and the operating model used by mature programs.

AI Red Teaming

All Topics

Red Team Foundations

AI Red Teaming Overview

Red Team vs Blue Team vs Purple

Red Team vs Pen Test vs Audit vs Evals

History & Evolution of AI Red Teaming

Red Team Ethics & Authorization

Deliverables & Operating Cadence

Program Design

Red Team Program Design

Recruitment & Skills Mix

Engagement Scoping

Legal Authorization & Safe Harbor

AI Bug Bounty Programs

External vs Internal Red Teams

AI Threat Modeling for Red Teams

AI System Attack Surface Mapping

Threat Actor Modeling

Attack Trees for AI

AI Kill Chain

Abuse Case Design

Threat Prioritization for Red Teams

Prompt-Based Attacks

Direct Prompt Injection

Indirect Prompt Injection

Jailbreak Taxonomy

Encoding & Obfuscation

Multi-Turn Attack Patterns

Universal Adversarial Suffixes

Prompt Attack Evaluation

Model & System Attacks

Model Extraction Attacks

Training Data Extraction

Model Inversion

Membership Inference (Red Team Lens)

Backdoor & Trojan Detection

Model Supply Chain Red Teaming

Agent & Tool-Use Attacks

Agent Hijacking

Tool Abuse Patterns

Multi-Agent Attack Patterns

Computer Use & Browser Agent Attacks

MCP Attack Surface

Agent Red-Team Evaluations

Multimodal Attacks

Vision-Based Jailbreaks

Audio Attack Patterns

Document & PDF Attacks

Deepfake-as-Input Attacks

OCR & Typography Attacks

Multimodal Red-Team Evaluation

Capability & Safety Evaluations

Dangerous Capability Evaluations

CBRN Uplift Evaluation

Cyber Capability Evaluation

Persuasion & Influence Evaluation

Autonomy & Self-Replication Evaluation

Deception & Scheming Evaluation

Frontier Lab Evaluation Suites

Reporting & Operations

Finding Lifecycle (Discover → Fix)

Severity Rubrics for AI Findings

Reproducible Repro Packages

Responsible Disclosure for AI

Vendor Coordination

Public Disclosure & System Cards

Tools, Industry & Future

Red Team Tools & Frameworks

AISI & Frontier-Lab Patterns

Red Team Standards & Frameworks

Future of AI Red Teaming