Loading...

Adversarial attack

An attempt to cause a model to produce incorrect or harmful outputs through crafted inputs. Adversarial robustness is relevant to security representations, product liability, possibly-infringing intellectual property outputs, and contractual performance standards. This can be an attack against the AI system as a whole or even just a prompt designed to elicit an unwanted result.

See: Evasion attack; Jailbreak; Prompt injection; Red teaming