Poisoning Attacks
A practical guide to poisoning attacks for AI risk management practitioners.
What This Lesson Covers
Poisoning Attacks is a key topic within Adversarial & Security Risk. In this lesson you will learn the underlying risk management discipline, the controlling frameworks and standards, how to apply the methods to real AI systems, and the open questions practitioners are actively working through. By the end you will be able to engage with poisoning attacks in real AI risk work with confidence.
This lesson belongs to the Technical AI Risks category of the AI Risk Management track. AI risk management sits at the intersection of safety engineering, model risk management, information security, privacy, and corporate governance. Understanding the underlying discipline is what lets you build AI risk programs that survive board scrutiny, regulator inquiry, and real-world incidents.
Why It Matters
Manage adversarial AI security risk. Learn evasion, poisoning, model theft, prompt injection, jailbreak, supply chain attacks, and defenses.
The reason poisoning attacks deserves dedicated attention is that AI risk is the fastest-evolving practice area in technology governance. New frameworks (NIST AI RMF GenAI Profile, ISO/IEC 42001, EU AI Act risk-management requirements) are landing every quarter, and incidents (hallucination harms, bias enforcement, agentic mishaps) are accumulating into case law. Risk officers, AI engineers, and product leaders who can reason from first principles will navigate the next framework or incident much more effectively than those who only know current rules.
How It Works in Practice
Below is a practical AI risk management framework for poisoning attacks. Read through it once, then think about how you would apply it to a real AI system in your portfolio.
# Adversarial AI risk register entries
ADVERSARIAL_RISKS = [
("Evasion (test-time)", "Crafted inputs that flip predictions (FGSM, PGD, C&W). Mitigation: adv training, input sanitization, ensembling."),
("Poisoning (train-time)", "Polluted training data that backdoors the model. Mitigation: data provenance, anomaly detection in training data, RONI."),
("Model extraction", "Black-box queries reconstruct the model. Mitigation: rate limit, output perturbation, watermarking."),
("Membership inference", "Determine if a record was in the training set. Mitigation: differential privacy, regularization."),
("Model inversion", "Reconstruct training inputs from outputs. Mitigation: output rounding, DP, gradient masking."),
("Prompt injection", "Override system instructions via input. Mitigation: input filtering, structured outputs, dual-LLM, sandboxing tools."),
("Jailbreak", "Bypass safety guardrails. Mitigation: red teaming, RLHF refusal training, output filtering, layered checks."),
("Supply chain compromise","Backdoors in pretrained models / poisoned datasets. Mitigation: hash-pinned models, AIBOM, vendor diligence."),
("Agentic blast radius", "Autonomous action causes harm. Mitigation: scoped tools, action allowlists, HITL for high-impact actions, kill switch."),
]
Step-by-Step Analytical Approach
- Identify the risk — Use threat modeling, scenario planning, ATLAS techniques, and horizon scanning. Risks should be specific (source, event, consequence), not vague (“AI could be biased”).
- Assess the risk — Inherent likelihood and impact. Apply the right method (qualitative rubric, FAIR/Monte Carlo, Bayesian network) for the level of decision the assessment supports.
- Decide the treatment — Mitigate (most common), transfer (insurance, vendor liability), accept (with documented residual risk and approval), or avoid (don’t deploy). Document who decided and why.
- Implement controls — Preventive (HITL, guardrails, refusal training), detective (drift monitoring, fairness monitoring, red-team probes), corrective (rollback, kill switch, retraining, customer notification).
- Monitor with KRIs — Define leading + lagging indicators with thresholds. Wire to dashboards and alerting. Tie thresholds to risk appetite.
- Report and improve — Risk committee monthly, board quarterly, regulators per cadence. Learn from incidents and external case law; refresh the register and controls.
When This Topic Applies (and When It Does Not)
Poisoning Attacks applies when:
- You operate AI systems whose failures could harm users, customers, employees, or the business
- You are subject to a sectoral regulator with model risk or AI guidance (financial services, healthcare, employment, public sector)
- You are subject to the EU AI Act, US state AI laws, or sector-specific AI rules
- You need to demonstrate AI risk management to the board, customers, auditors, or in litigation
It does not apply (or applies lightly) when:
- The AI system is purely internal experimentation with no production exposure
- The AI system is genuinely low-stakes (e.g., autocomplete in an internal tool with no downstream consequence)
- The AI system is not yet deployed (though risk planning at design stage is still valuable)
Practitioner Checklist
- Have you classified this AI system into the right risk tier under your operative framework?
- Is the risk register entry specific enough to be actionable (source / event / consequence)?
- Are inherent and residual scores documented and defensible?
- Are controls operational, not aspirational? Have they been tested?
- Are KRIs wired to alerting, with thresholds tied to risk appetite?
- Is there a kill switch / rollback path that has actually been exercised?
- Is there a board-ready narrative that explains the risk, the controls, and the residual?
Disclaimer
This educational content is provided for general informational purposes only. It does not constitute legal, regulatory, audit, or risk-management advice; it does not create a professional advisory relationship; and it should not be relied on for any specific AI deployment, audit, or compliance matter. AI risk standards and regulations vary by jurisdiction and change rapidly. Consult qualified counsel and risk professionals for advice on your specific situation.
Next Steps
The other lessons in Adversarial & Security Risk build directly on this one. Once you are comfortable with poisoning attacks, the natural next step is to combine it with the patterns in the surrounding lessons — that is where doctrinal mastery turns into a working risk program. AI risk management is most useful as an integrated system covering identification, assessment, treatment, control, monitoring, and reporting.
Lilly Tech Systems