Loading...

Prompt injection

An attack in which malicious input causes a model or agent to ignore intended instructions or perform unintended actions (e.g., by overriding system/developer prompts or by exploiting tool integrations). Prompt injection can occur directly via user input or indirectly via retrieved content and is treated as a security risk in many threat models.

See: Adversarial attack; Jailbreak; Security