Bypassing Prompt Injection Scanners: 12 Evasion Techniques and Defenses
This article discusses 12 techniques that can bypass prompt injection scanners, and the defenses implemented in the open-source ClawGuard scanner to detect these evasions.
Why it matters
Prompt injection attacks are a growing threat, and understanding the latest evasion techniques and effective defenses is crucial for securing AI systems.
Key Points
- 1Leetspeak substitution, character spacing, and zero-width character injection can bypass naive scanners
- 2Newline splitting, Markdown formatting, and Unicode homoglyphs are other evasion techniques
- 3ClawGuard implements a pipeline of 12 preprocessing steps to detect and defend against these evasions
- 4The scanner achieves an F1 score of 99.0% on 262 test cases, but some advanced attacks like acrostic and crescendo remain challenging
Details
The article discusses 12 techniques that can bypass prompt injection scanners, including leetspeak substitution, character spacing, zero-width character injection, newline splitting, Markdown formatting, Unicode homoglyphs, and more. The open-source ClawGuard scanner implements a pipeline of 12 preprocessing steps to detect and defend against these evasions, achieving an F1 score of 99.0% on 262 test cases. However, the article notes that some advanced attacks like acrostic and crescendo remain challenging for current scanners, as they require semantic analysis beyond just pattern matching.
No comments yet
Be the first to comment