Loading...

Adversarial success

An AI system failure due to adversarial attack, where unwanted model output results in adverse effects like leakage of privileged data, violation of guardrails, expansion of privilege, or unwanted output. This defines the failure condition for many types of testing and is relevant to breach notification and incident response obligations.

See: Data leakage; Jailbreak; Prompt injection