An attack in which malicious input causes a model or agent to ignore intended instructions or perform unintended actions (e.g., by overriding system/developer prompts or by exploiting tool integrations). Prompt injection can occur directly via user input or indirectly via retrieved content and is treated as a security risk in many threat models.
Prompt injection
C
R
T
See: Adversarial attack; Jailbreak; Security