Best Practices
Deploying prompt injection defenses in production requires balancing security with user experience, managing false positives, and continuously evolving your defenses against new attack techniques.
Production Defense Architecture
| Layer | Component | Latency Impact | Effectiveness |
|---|---|---|---|
| Pre-processing | Unicode normalization, control char stripping | <1ms | Blocks encoding attacks |
| Fast Detection | Regex patterns, blocklist matching | <5ms | Catches known patterns |
| ML Classification | BERT-based injection classifier | 20-50ms | Catches semantic attacks |
| Prompt Construction | Sandwich defense, random delimiters, instruction hierarchy | <1ms | Reduces injection success rate |
| Output Scanning | Canary check, PII scan, safety classifier | 10-30ms | Catches successful injections |
| Async Analysis | LLM-as-judge, behavioral anomaly detection | 0 (async) | Deep analysis for trends |
Continuous Testing Pipeline
-
Maintain an Attack Database
Collect and curate a comprehensive database of injection payloads, organized by technique and target. Include both public research payloads and internally discovered attacks. Update weekly.
-
Automated Red Team Testing
Run your attack database against production defenses on every deployment. Track the defense bypass rate over time. Set a maximum acceptable bypass rate and block deployments that exceed it.
-
Fuzzing and Mutation
Automatically generate variations of known attacks through mutation (character substitution, encoding, reformulation). This helps discover defense gaps that exact-match testing misses.
-
Manual Red Teaming
Conduct quarterly manual red teaming exercises where skilled testers attempt to bypass defenses using creative new approaches. Document and add successful attacks to the test database.
Managing False Positives
# Tiered response strategy for injection detection
class TieredResponse:
def handle_detection(self, input_text, score, context):
if score > 0.95:
# High confidence: block and log
return self.block_request(input_text, reason="injection")
elif score > 0.7:
# Medium confidence: allow with restrictions
return self.restricted_mode(
input_text,
disable_tools=True,
strict_output_filter=True
)
elif score > 0.4:
# Low confidence: allow with enhanced monitoring
return self.monitored_mode(
input_text,
flag_for_review=True
)
else:
# Normal operation
return self.normal_mode(input_text)
Defense Evolution Strategy
Track the Research Landscape
Follow AI security publications, conference proceedings (USENIX, IEEE S&P, NeurIPS), and responsible disclosure channels. New attack techniques are published regularly.
Model Update Testing
When your LLM provider updates the model, re-run your full test suite. Model updates can both improve and regress injection resistance in unpredictable ways.
Bug Bounty Programs
Consider running an AI-specific bug bounty program focused on prompt injection. External researchers often discover attack vectors that internal teams miss.
Metrics and Reporting
Track injection attempt rates, defense bypass rates, false positive rates, and mean time to patch. Report these metrics to leadership quarterly.
Lilly Tech Systems