Loading...

AI Risk Registry

Export PDF
Clear
Registry ID Format: RR-N00 identifies the category, RR-NNN identifies the group, RR-NNN.MMM identifies individual risks. Click to expand.

Input-level attacks targeting prompt handling, safety alignment, and identity. These risks involve adversaries exploiting the natural language interface to manipulate AI behavior, bypass safety controls, or hijack system goals.

Prompt injection attacks exploit the fundamental architecture of LLMs by embedding malicious instructions within user inputs or external data sources. These attacks hijack the AI system's intended goals, causing it to execute attacker-controlled instructions instead of its programmed objectives. This category encompasses both direct manipulation through user input and indirect attacks via poisoned data sources, representing one of the most significant security challenges for deployed AI systems.

Attackers craft explicit commands within user input to override or replace the AI system's operational directives. Common patterns include phrases like "ignore previous instructions" or "you are now in developer mode." This represents the most straightforward form of prompt injection, targeting the model's instruction-following capabilities directly.

Cross-references
Cisco AI Taxonomy AISubtech-1.1.1 MITRE ATLAS AML.T0051.000 , AML.T0093 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Malicious instructions are disguised through encoding techniques, character substitution, or linguistic tricks to evade detection mechanisms while preserving attack functionality. Methods include leetspeak, unicode homoglyphs, base64 encoding, language mixing, and semantic obfuscation through synonyms or paraphrasing.

Cross-references
Cisco AI Taxonomy AISubtech-1.1.2 MITRE ATLAS AML.T0051.000 , AML.T0093 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

In multi-agent systems, attackers inject malicious instructions through one agent's output that are then trusted and executed by downstream agents. This exploits the inherent trust relationships between cooperating agents, where outputs from one component become trusted inputs to another.

Cross-references
Cisco AI Taxonomy AISubtech-1.1.3 MITRE ATLAS AML.T0051.000 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 , ASI07 OWASP LLM Top 10 llm01-prompt-injection

Malicious instructions embedded within external data sources such as documents, web pages, emails, or API responses are retrieved and processed by the AI system. These poisoned sources inject instructions that override the model's behavior without the user's awareness, exploiting RAG systems and data retrieval workflows.

Cross-references
Cisco AI Taxonomy AISubtech-1.2.1 MITRE ATLAS AML.T0051.001 , AML.T0067 , AML.T0070 , AML.T0093 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection , llm032025-supply-chain

Hidden or encoded instructions within external data sources designed to evade content scanning and input validation while remaining interpretable by the AI model. This combines indirect injection with evasion techniques to maximize attack success probability.

Cross-references
Cisco AI Taxonomy AISubtech-1.2.2 MITRE ATLAS AML.T0051.001 , AML.T0067 , AML.T0093 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection , llm032025-supply-chain

Exploitation of inter-agent communication channels through poisoned external content that propagates between agents. One agent retrieves compromised data which then flows through the multi-agent workflow, affecting multiple downstream components.

Cross-references
Cisco AI Taxonomy AISubtech-1.2.3 MITRE ATLAS AML.T0051.001 , AML.T0067 , AML.T0070 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 , ASI07 OWASP LLM Top 10 llm01-prompt-injection , llm032025-supply-chain

Attackers gradually shift the AI system's operational objectives over multiple interaction turns through carefully crafted prompts. Contradictory or concealed objectives are embedded within conversations, slowly steering the model away from its intended behavior toward attacker-defined goals.

Cross-references
Cisco AI Taxonomy AISubtech-1.3.1 MITRE ATLAS AML.T0018 , AML.T0051 , AML.T0067 MITRE ATT&CK T1078 , TA0001 NIST AI/ML Framework NISTAML.027 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm062025-excessive-agency

Attackers compromise external components that AI agents depend on, including tools, prompt templates, resources, or dependencies. Malicious objectives are injected through these trusted supply chain elements, redirecting agent behavior at a foundational level.

Cross-references
Cisco AI Taxonomy AISubtech-1.3.2 MITRE ATLAS AML.T0010 , AML.T0018 , AML.T0051 , AML.T0067 , AML.T0093 NIST AI/ML Framework NISTAML.027 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm062025-excessive-agency

Malicious instructions, prompts, or data are embedded within images using techniques like steganography, adversarial patches, or hidden text. Vision-language models extract and interpret these hidden payloads, enabling attacks that bypass text-based content filters.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.1 MITRE ATLAS AML.T0043 , AML.T0050 , AML.T0051 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Modification of visual content through pixel-level changes, structural alterations, or pattern overlays to influence how AI models perceive and process images. Unlike embedded text injection, this targets the model's visual interpretation directly to cause misclassification or altered decision-making.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.2 MITRE ATLAS AML.T0043 , AML.T0050 , AML.T0051 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Inaudible or unintelligible voice commands embedded within audio streams using ultrasonic frequencies, backmasking, or steganographic techniques. Automatic speech recognition models interpret these hidden signals as valid instructions while remaining imperceptible to human listeners.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.3 MITRE ATLAS AML.T0015 , AML.T0043 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection , llm052025-improper-output-handling

Harmful content or malicious instructions embedded within video streams through specific frames, QR-like visual triggers, or temporal patterns. These attacks exploit multimodal model processing of video content to bypass guardrails and inject commands.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.4 MITRE ATLAS AML.T0015 , AML.T0043 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm082025-vector-and-embedding-weaknesses

Jailbreak attacks specifically target safety alignment and content restrictions built into AI models during training. Unlike prompt injection which hijacks task execution, jailbreaking focuses on bypassing ethical guidelines, content policies, and behavioral constraints. Successful jailbreaks cause models to generate prohibited content, provide dangerous information, or behave in ways their training was designed to prevent.

Constructing elaborate fictional scenarios, roleplay frameworks, or alternative contexts that reframe harmful requests as acceptable within the created narrative. Examples include the "DAN" (Do Anything Now) jailbreak where the model is convinced to operate under an unrestricted alternate persona.

Cross-references
Cisco AI Taxonomy AISubtech-2.1.1 MITRE ATLAS AML.T0054 , AML.T0093 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Disguising jailbreak attempts through encoding schemes, linguistic obfuscation, character substitution, or creative formatting to evade jailbreak detection systems. The underlying intent to bypass safety measures is preserved while the surface presentation evades pattern-matching defenses.

Cross-references
Cisco AI Taxonomy AISubtech-2.1.2 MITRE ATLAS AML.T0054 , AML.T0093 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Using carefully constructed logical arguments, philosophical frameworks, or ethical reasoning to convince the model that providing harmful information actually aligns with its values. The model is essentially argued into compliance through persuasion rather than technical exploitation.

Cross-references
Cisco AI Taxonomy AISubtech-2.1.3 MITRE ATLAS AML.T0054 , AML.T0093 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Exploiting specific tokens, special characters, control sequences, or tokenization edge cases to manipulate model processing in ways that bypass safety filters. This targets the mechanical aspects of how models process input rather than higher-level reasoning.

Cross-references
Cisco AI Taxonomy AISubtech-2.1.4 MITRE ATLAS AML.T0043 , AML.T0054 , AML.T0093 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Coordinating multiple AI agents to collectively bypass safety measures where individual agents perform seemingly benign tasks that combine to achieve jailbreak objectives. Compromised agents may assist others in circumventing restrictions through distributed attack patterns.

Cross-references
Cisco AI Taxonomy AISubtech-2.1.5 MITRE ATLAS AML.T0054 NIST AI/ML Framework NISTAML.015 OWASP Agentic Security Initiative ASI01 , ASI07 OWASP LLM Top 10 llm01-prompt-injection

Masquerading attacks exploit identity and authentication weaknesses in AI systems, allowing attackers to impersonate trusted agents, services, or users. These attacks undermine the trust assumptions that multi-agent and integrated AI systems rely on for secure operation. Successful masquerading enables unauthorized access, instruction injection through trusted channels, and evasion of access controls.

Manipulating how agent or user identities are represented within context, metadata, or interaction patterns to evade detection, tracking, or access controls. Attackers obscure their true identity to appear as legitimate system participants.

Cross-references
Cisco AI Taxonomy AISubtech-3.1.1 MITRE ATLAS AML.T0073 , AML.T0074 , AML.T0091.000 MITRE ATT&CK T1036 , T1656 OWASP Agentic Security Initiative ASI03 OWASP LLM Top 10 llm062025-excessive-agency

Impersonating legitimate agents or MCP-registered services to inject malicious instructions, responses, or outputs that other system components treat as trusted. This exploits the assumption of authenticity within multi-agent systems and protocol-mediated toolchains.

Cross-references
Cisco AI Taxonomy AISubtech-3.1.2 MITRE ATLAS AML.T0074 , AML.T0083 MITRE ATT&CK T1656 OWASP Agentic Security Initiative ASI03 OWASP LLM Top 10 llm062025-excessive-agency

Communication compromise attacks target the channels, protocols, and boundaries that govern how AI components interact with each other and external systems. This includes inserting rogue agents, exploiting context window limitations, violating session boundaries, and manipulating communication protocols. These attacks undermine the integrity of AI system communications at a fundamental level.

Unauthorized insertion of a malicious agent into a multi-agent system that operates contrary to intended purpose. The rogue agent may steal data, cause disruption, or autonomously serve attacker goals while mimicking normal behavior patterns to evade detection.

Cross-references
Cisco AI Taxonomy AISubtech-4.1.1 MITRE ATLAS AML.T0051 , AML.T0068 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI07 OWASP LLM Top 10 llm032025-supply-chain

Deliberate overloading or manipulation of a model's limited context window to displace or overwrite crucial system instructions and safety guidelines. Attackers fill the context with benign content until critical instructions are pushed out of the processing window.

Cross-references
Cisco AI Taxonomy AISubtech-4.2.1 MITRE ATLAS AML.T0005 , AML.T0010 , AML.T0053 OWASP Agentic Security Initiative ASI06 , ASI07 OWASP LLM Top 10 llm01-prompt-injection , llm052025-improper-output-handling

Crossing expected conversational or transactional boundaries to persist malicious instructions across separate sessions. Attacks exploit persistent memory, session management flaws, or memory carryover mechanisms to maintain influence beyond intended session scope.

Cross-references
Cisco AI Taxonomy AISubtech-4.2.2 MITRE ATLAS AML.T0012 , AML.T0055 OWASP Agentic Security Initiative ASI06 , ASI07 OWASP LLM Top 10 llm062025-excessive-agency

Exploiting irregular, conflicting, or misaligned data structures that don't align with model expectations. These inconsistencies can cause vulnerabilities, parsing errors, performance degradation, or security bypasses in AI systems.

Cross-references
Cisco AI Taxonomy AISubtech-4.3.1 MITRE ATLAS AML.T0018 , AML.T0067 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI07 OWASP LLM Top 10 llm032025-supply-chain

Exploiting situations where multiple components share the same identifier, causing confusion, misrouting, or security vulnerabilities. Attackers create colliding names for datasets, tools, APIs, or model identifiers to hijack legitimate system operations.

Cross-references
Cisco AI Taxonomy AISubtech-4.3.2 MITRE ATLAS AML.T0010 NIST AI/ML Framework NISTAML.051 OWASP Agentic Security Initiative ASI07 OWASP LLM Top 10 llm032025-supply-chain

Using DNS rebinding or similar techniques to trick an AI system into treating an attacker-controlled external domain as part of the trusted internal network. This bypasses same-origin policies and network security controls through DNS manipulation.

Cross-references
Cisco AI Taxonomy AISubtech-4.3.3 MITRE ATLAS AML.T0049 NIST AI/ML Framework NISTAML.039 OWASP Agentic Security Initiative ASI07 OWASP LLM Top 10 llm032025-supply-chain

Capturing legitimate API calls, authentication tokens, or model queries and resending them later to repeat actions or bypass authentication. This classic attack pattern applies to AI system communications where request authentication may be inadequate.

Cross-references
Cisco AI Taxonomy AISubtech-4.3.4 MITRE ATLAS AML.T0012 , AML.T0055 , AML.T0068 NIST AI/ML Framework NISTAML.027 , NISTAML.051 OWASP Agentic Security Initiative ASI07 OWASP LLM Top 10 llm022025-sensitive-information-disclosure , llm052025-improper-output-handling

Exploiting system mechanisms to artificially expand an agent's capabilities, permissions, or authority beyond intended limits. Attackers escalate privileges through protocol manipulation or capability misrepresentation to enable unauthorized actions.

Cross-references
Cisco AI Taxonomy AISubtech-4.3.5 MITRE ATLAS AML.T0053 OWASP Agentic Security Initiative ASI03 , ASI07 OWASP LLM Top 10 llm062025-excessive-agency

Subverting security mechanisms designed to isolate resources across different trust boundaries, primarily the Same-Origin Policy. Attackers trick AI agents into making unauthorized requests or sharing data across domains, protocols, or services.

Cross-references
Cisco AI Taxonomy AISubtech-4.3.6 MITRE ATLAS AML.T0017 , AML.T0053 OWASP Agentic Security Initiative ASI07 OWASP LLM Top 10 llm062025-excessive-agency

Persistence attacks establish long-term footholds within AI systems by injecting malicious content into memory systems, configuration stores, or agent profiles. Unlike transient attacks that affect single interactions, persistence attacks influence all future sessions, creating ongoing compromise that survives system restarts and session boundaries.

Seeding malicious, misleading, or adversarial data into an AI system's persistent memory (long-term) or working memory (short-term) to influence current and future interactions. Poisoned memories bias behavior and can enable self-replicating attack patterns.

Cross-references
Cisco AI Taxonomy AISubtech-5.1.1 MITRE ATLAS AML.T0061 , AML.T0070 , AML.T0092 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI06 OWASP LLM Top 10 llm01-prompt-injection

Unauthorized modification of stored agent identity, preferences, role definitions, capabilities, permissions, or behavioral parameters. Attackers alter configuration to enable malicious behaviors, maintain access, escalate privileges, or evade detection across sessions.

Cross-references
Cisco AI Taxonomy AISubtech-5.2.1 MITRE ATLAS AML.T0018 MITRE ATT&CK T1098 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm032025-supply-chain , llm042025-data-and-model-poisoning

Attacks on training data, model weights, privacy, and supply chain. These risks target the data pipeline and model artifacts, from training data poisoning to model extraction and adversarial manipulation.

Feedback loop manipulation targets the learning and adaptation mechanisms of AI systems. Attackers poison training data, knowledge bases, or reinforcement signals to influence how models learn and evolve over time. These attacks can introduce backdoors, biases, or degraded performance that persists through model updates and affects all users of the compromised system.

Inserting false, malicious, biased, or misleading data into external knowledge bases, vector databases, or RAG systems that LLMs rely on for accurate responses. Poisoned knowledge corrupts outputs for all users querying affected topics.

Cross-references
Cisco AI Taxonomy AISubtech-6.1.1 Cisco Model Security (MDL) MDL-018 , MDL-020 MITRE ATLAS AML.T0019 , AML.T0020 , AML.T0070 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI06 OWASP LLM Top 10 llm042025-data-and-model-poisoning

Subtly influencing user feedback, evaluation signals, or reward mechanisms in reinforcement learning systems to skew model learning toward attacker-controlled objectives. The model's training is gradually steered in unintended directions through manipulated feedback.

Cross-references
Cisco AI Taxonomy AISubtech-6.1.2 MITRE ATLAS AML.T0061 , AML.T0070 NIST AI/ML Framework NISTAML.013 OWASP Agentic Security Initiative ASI06 , ASI08 OWASP LLM Top 10 llm042025-data-and-model-poisoning

Directly injecting false or adversarial signals into training pipelines, feedback channels, or reward systems. Unlike subtle biasing, this involves active corruption of the learning process through reward hacking or signal manipulation.

Cross-references
Cisco AI Taxonomy AISubtech-6.1.3 MITRE ATLAS AML.T0018 , AML.T0020 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI06 , ASI08 OWASP LLM Top 10 llm042025-data-and-model-poisoning

Sabotage attacks aim to degrade AI system reliability, accuracy, and trustworthiness without necessarily seeking to control or redirect behavior. This includes corrupting memory systems, poisoning data sources, manipulating retrieval mechanisms, and stealing authentication tokens. The goal is often disruption, degradation, or undermining confidence in AI system outputs.

Strategically planting memorable or salient content to bias the model's recall toward attacker-chosen information. By manipulating what content is most retrievable, attackers influence how the model responds to related queries.

Cross-references
Cisco AI Taxonomy AISubtech-7.2.1 MITRE ATLAS AML.T0018 , AML.T0020 , AML.T0070 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI06 OWASP LLM Top 10 llm042025-data-and-model-poisoning

Altering how memory embeddings, indexes, or retrieval mechanisms function to favor retrieval of attacker-controlled content over legitimate information. This targets the technical infrastructure of memory systems rather than the content itself.

Cross-references
Cisco AI Taxonomy AISubtech-7.2.2 MITRE ATLAS AML.T0020 , AML.T0070 NIST AI/ML Framework NISTAML.013 , NISTAML.024 OWASP Agentic Security Initiative ASI06 OWASP LLM Top 10 llm042025-data-and-model-poisoning

External datasets from vendors, partners, open-source repositories, or public sources containing inaccurate, incomplete, malicious, or manipulated information that is incorporated into AI training, fine-tuning, or evaluation processes.

Cross-references
Cisco AI Taxonomy AISubtech-7.3.1 MITRE ATLAS AML.T0010 , AML.T0019 NIST AI/ML Framework NISTAML.013 , NISTAML.051 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm032025-supply-chain , llm042025-data-and-model-poisoning

Stealing authentication tokens, API keys, or credentials from MCP servers or similar agent integration frameworks. Stolen tokens enable unauthorized access to connected systems, agent impersonation, and access to restricted resources.

Cross-references
Cisco AI Taxonomy AISubtech-7.4.1 MITRE ATLAS AML.T0012 , AML.T0055 MITRE ATT&CK T1087 , T1528 , T1552 NIST AI/ML Framework NISTAML.051 OWASP Agentic Security Initiative ASI03 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Privacy violation risks encompass the various ways AI systems can expose, leak, or enable inference of sensitive information. This includes determining whether specific data was used in training, extracting training data or PII from model outputs, leaking system configuration details, and extracting system prompts. These risks have significant regulatory, legal, and reputational implications.

Querying and analyzing model behavior to determine whether specific data points, records, or individuals were present in the training dataset or knowledge base. Successful inference reveals private information about training data composition.

Cross-references
Cisco AI Taxonomy AISubtech-8.1.1 MIT AI Risk Repository 2.1 MITRE ATLAS AML.T0024.000 , AML.T0040 , AML.T0063 NIST AI/ML Framework NISTAML.033 OWASP Agentic Security Initiative ASI09 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Extracting, reconstructing, or inferring information from training data through model outputs, internal behavior analysis, or targeted queries. The model's learned representations can reveal private information about training data subjects.

Cross-references
Cisco AI Taxonomy AISubtech-8.2.1 MIT AI Risk Repository 2.1 MITRE ATLAS AML.T0024.000 , AML.T0035 , AML.T0037 , AML.T0057 NIST AI/ML Framework NISTAML.037 OWASP Agentic Security Initiative ASI09 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Release of sensitive information or PII from training data during normal inference, often triggered through prompt injection or extraction techniques. The model inadvertently outputs private data that was present in its training corpus.

Cross-references
Cisco AI Taxonomy AISubtech-8.2.2 MIT AI Risk Repository 2.1 MITRE ATLAS AML.T0024.000 , AML.T0035 , AML.T0036 , AML.T0037 , AML.T0057 , AML.T0069 NIST AI/ML Framework NISTAML.037 OWASP Agentic Security Initiative ASI09 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Manipulation of AI agents to use their legitimate tool access for unauthorized data exfiltration. Attackers craft prompts that cause agents to retrieve sensitive data through tools and transmit it to attacker-controlled destinations.

Cross-references
Cisco AI Taxonomy AISubtech-8.2.3 MITRE ATLAS AML.T0086 OWASP Agentic Security Initiative ASI02 , ASI09 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Disclosure of descriptive information about tools including names, descriptions, parameter schemas, versions, and capabilities. Exposed metadata helps attackers understand system architecture and craft targeted attacks.

Cross-references
Cisco AI Taxonomy AISubtech-8.3.1 MITRE ATLAS AML.T0036 , AML.T0075 NIST AI/ML Framework NISTAML.038 OWASP Agentic Security Initiative ASI02 OWASP LLM Top 10 llm022025-sensitive-information-disclosure , llm052025-improper-output-handling

Unintended disclosure of internal configuration, architecture, environment details, or infrastructure information. Leaked system information aids attackers in understanding deployment environments and crafting targeted exploits.

Cross-references
Cisco AI Taxonomy AISubtech-8.3.2 MITRE ATLAS AML.T0036 , AML.T0075 NIST AI/ML Framework NISTAML.039 OWASP LLM Top 10 llm032025-supply-chain

Extraction of system prompts, instructions, or initial context that guides model behavior. Exposed prompts reveal operational details, security mechanisms, intellectual property, or confidential business logic not intended for disclosure.

Cross-references
Cisco AI Taxonomy AISubtech-8.4.1 MITRE ATLAS AML.T0035 , AML.T0056 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

AI systems exposing, generating, or misusing personally identifiable information (PII), protected health information (PHI), or payment card industry (PCI) data. This includes revealing sensitive personal details, medical records, or financial information through AI outputs or enabling their collection and exploitation.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.24 , AISubtech-15.1.25 , AISubtech-8.2.2 MITRE ATLAS AML.T0024.000 , AML.T0035 , AML.T0036 , AML.T0037 , AML.T0057 , AML.T0069 NIST AI/ML Framework NISTAML.037 OWASP Agentic Security Initiative ASI09 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Supply chain compromise targets the dependencies, tools, models, and infrastructure that AI systems rely on. Attackers can compromise systems by manipulating code execution capabilities, gaining unauthorized system access, injecting malicious dependencies, or installing backdoors. These attacks often provide broad access and persistence by compromising trusted components used across many deployments.

Exploitation of AI models with code interpreter capabilities to execute arbitrary code on underlying systems. Attackers use prompt injection or tool manipulation to cause models to write and execute malicious code with system-level access.

Cross-references
Cisco AI Taxonomy AISubtech-9.1.1 MITRE ATLAS AML.T0050 NIST AI/ML Framework NISTAML.023 OWASP Agentic Security Initiative ASI04 , ASI05 OWASP LLM Top 10 llm032025-supply-chain

Manipulating AI systems to access underlying resources without authorization, including file modification, configuration changes, privilege escalation, or command execution. These attacks exploit the system access that AI components require for legitimate operation.

Cross-references
Cisco AI Taxonomy AISubtech-9.1.2 MITRE ATLAS AML.T0012 NIST AI/ML Framework AML.T0044 OWASP Agentic Security Initiative ASI04 , ASI05 OWASP LLM Top 10 llm032025-supply-chain

Exploiting models or agents to gain unauthorized access to network resources, internal systems, external services, or restricted network segments. Attackers leverage legitimate network capabilities to reach systems that should be isolated.

Cross-references
Cisco AI Taxonomy AISubtech-9.1.3 MITRE ATLAS AML.T0049 NIST AI/ML Framework AML.T0072 OWASP Agentic Security Initiative ASI04 , ASI05 OWASP LLM Top 10 llm032025-supply-chain

Using LLMs to generate, optimize, or adapt traditional injection payloads (SQL injection, command injection, XSS) that bypass detection mechanisms. The LLM acts as an intelligent intermediary that crafts, refines, or personalizes malicious payloads for specific targets.

Cross-references
Cisco AI Taxonomy AISubtech-9.1.4 MITRE ATLAS AML.T0050 , AML.T0051 , AML.T0067 MITRE ATT&CK T1588.007 NIST AI/ML Framework NISTAML.024 OWASP Agentic Security Initiative ASI04 , ASI05 OWASP LLM Top 10 llm01-prompt-injection , llm052025-improper-output-handling , llm062025-excessive-agency

Manipulating template engines by injecting malicious syntax through AI-generated content that is unsafely embedded into server-side templates. This enables arbitrary code execution, template logic manipulation, or system compromise through rendering pipelines.

Cross-references
Cisco AI Taxonomy AISubtech-9.1.5 MITRE ATLAS AML.T0068 , AML.T0074 OWASP Agentic Security Initiative ASI04 , ASI05 OWASP LLM Top 10 llm082025-vector-and-embedding-weaknesses

Security weaknesses that emerge when AI system components (code, architecture, parameters, configurations) are intentionally or unintentionally concealed. Obfuscation creates security blind spots that attackers can exploit while defenders lack visibility.

Cross-references
Cisco AI Taxonomy AISubtech-9.2.1 Cisco Model Security (MDL) MDL-001 , MDL-003 , MDL-009 , MDL-011 , MDL-016 , MDL-017 , MDL-019 MITRE ATLAS AML.T0068 , AML.T0074 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm082025-vector-and-embedding-weaknesses

Models maliciously modified to exhibit trigger-activated behavior that causes misclassification, malicious outputs, or undesirable biases when given specific inputs, while behaving normally otherwise. These backdoors are difficult to detect through standard evaluation.

Cross-references
Cisco AI Taxonomy AISubtech-9.2.2 Cisco Model Security (MDL) MDL-021 MITRE ATLAS AML.T0010 , AML.T0058 NIST AI/ML Framework NISTAML.023 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm082025-vector-and-embedding-weaknesses

Introduction of malicious tools, APIs, or packages into the toolset, registry, or dependency chain used by AI systems. Models unknowingly invoke compromised tools that execute attacks or expose data while appearing to function normally.

Cross-references
Cisco AI Taxonomy AISubtech-9.3.1 Cisco Model Security (MDL) MDL-023 MITRE ATLAS AML.T0010 , AML.T0053 NIST AI/ML Framework NISTAML.018 , NISTAML.023 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm032025-supply-chain

Publishing malicious packages, tools, or MCP servers with names similar to legitimate ones (typosquatting, combosquatting) to trick developers, orchestrators, or agents into installing compromised components.

Cross-references
Cisco AI Taxonomy AISubtech-9.3.2 MITRE ATLAS AML.T0010 NIST AI/ML Framework NISTAML.039 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm032025-supply-chain

Replacing a once-legitimate trusted tool or package with malicious code after trust and adoption have been established. This exploits existing deployments that auto-update or don't pin versions, turning trusted dependencies into attack vectors.

Cross-references
Cisco AI Taxonomy AISubtech-9.3.3 MITRE ATLAS AML.T0010 , AML.T0018 NIST AI/ML Framework NISTAML.051 OWASP Agentic Security Initiative ASI04 OWASP LLM Top 10 llm032025-supply-chain

System failure due to code implementation choices or errors, including bugs from open-source dependencies and imperfect realization of design specifications.

Cross-references
MIT AI Risk Repository 7.3

Model extraction attacks attempt to steal or replicate proprietary AI models through various techniques including systematic API querying, weight reconstruction, and model inversion. Successful extraction enables attackers to replicate expensive model capabilities, conduct further attacks on extracted models, or access intellectual property embedded in model parameters.

Systematic querying of a model's API to extract responses, behavior patterns, and model characteristics without authorization. Attackers build datasets of input-output pairs to train surrogate models that replicate the target's functionality.

Cross-references
Cisco AI Taxonomy AISubtech-10.1.1 MITRE ATLAS AML.T0035 , AML.T0040 , AML.T0063 NIST AI/ML Framework NISTAML.038 OWASP LLM Top 10 llm102025-unbounded-consumption

Attempts to recover or approximate underlying model weights, parameters, or architecture by exploiting access to model outputs, API responses, or side channels. Successful reconstruction provides full model access without legitimate authorization.

Cross-references
Cisco AI Taxonomy AISubtech-10.1.2 Cisco Model Security (MDL) MDL-022 MITRE ATLAS AML.T0018 OWASP LLM Top 10 llm102025-unbounded-consumption

Reconstructing sensitive datasets, PII, or training data from model outputs through targeted queries, model inversion attacks, or exploitation of model memorization. Attackers extract private information that was supposed to remain protected within the training process.

Cross-references
Cisco AI Taxonomy AISubtech-10.1.3 NIST AI/ML Framework NISTAML.033 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Reconstructing private training data, sensitive features, or confidential information by exploiting the model's learned representations, decision boundaries, or output patterns. The model is effectively inverted to reveal what it learned from training.

Cross-references
Cisco AI Taxonomy AITech-10.2.1 MITRE ATLAS AML.T0024.001 NIST AI/ML Framework NISTAML.033 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Adversarial evasion encompasses techniques where attackers craft inputs specifically designed to bypass security controls, evade detection mechanisms, or exploit differences between AI components. Unlike general adversarial attacks that target model accuracy, evasion techniques focus on understanding and circumventing the defensive measures protecting AI systems. These attacks can be tailored to specific agents, tools, environments, or model implementations, making them particularly challenging to defend against in complex multi-agent architectures.

Attackers craft inputs that exploit the unique behaviors, processing patterns, or roles of specific agent types within a multi-agent system. By understanding how different agents (such as retrievers, planners, verifiers, or executors) handle inputs differently, adversaries can create payloads that appear benign to some agents while triggering malicious behavior through others.

Cross-references
Cisco AI Taxonomy AISubtech-11.1.1 MITRE ATLAS AML.T0015 OWASP LLM Top 10 llm01-prompt-injection

Adversaries design payloads that evade security tools and content filters while manifesting malicious behavior when routed to specific vulnerable tools or APIs in the workflow. A string may appear harmless in a chat context but trigger exploits when passed to file I/O tools, database queries, or system commands.

Cross-references
Cisco AI Taxonomy AISubtech-11.1.2 MITRE ATLAS AML.T0015 OWASP LLM Top 10 llm052025-improper-output-handling

Malicious inputs that activate only in specific runtime environments by detecting characteristics such as development vs. production settings, cloud vs. on-premise deployments, operating system types, or presence of debug flags. The payload remains dormant during testing but activates when deployed to target environments.

Cross-references
Cisco AI Taxonomy AISubtech-11.1.3 MITRE ATLAS AML.T0015 OWASP LLM Top 10 llm052025-improper-output-handling , llm082025-vector-and-embedding-weaknesses

Adversarial payloads explicitly crafted with knowledge of existing defensive mechanisms including prompt constraints, content filters, verification steps, and safety guardrails. These attacks adapt specifically to evade the known defenses deployed in a target system.

Cross-references
Cisco AI Taxonomy AISubtech-11.1.4 MITRE ATLAS AML.T0015 , AML.T0051.000 OWASP LLM Top 10 llm01-prompt-injection

Probing, testing, or analyzing an AI model to determine its specific identity, version, fine-tuning status, or architecture characteristics. This reconnaissance enables attackers to craft model-specific exploits that target known vulnerabilities or behaviors of particular model implementations.

Cross-references
Cisco AI Taxonomy AISubtech-11.2.1 MITRE ATLAS AML.T0014 , AML.T0015 , AML.T0067 NIST AI/ML Framework NISTAML.051 OWASP LLM Top 10 llm01-prompt-injection , llm042025-data-and-model-poisoning , llm102025-unbounded-consumption

Payloads designed to remain benign across most models but trigger harmful actions specifically on targeted models. Differences in tokenization, instruction-following behavior, or training data create model-specific vulnerabilities that attackers can exploit while maintaining an appearance of safety on other models.

Cross-references
Cisco AI Taxonomy AISubtech-11.2.2 MITRE ATLAS AML.T0015 , AML.T0067 OWASP LLM Top 10 llm01-prompt-injection

Downstream harm via unsafe outputs, actions, and misuse. These risks emerge when AI systems interact with external systems, generate harmful content, or are weaponized for malicious purposes.

Action-space and integration abuse risks arise when attackers exploit the tools, APIs, and integrations available to AI systems. As AI agents gain access to more external capabilities through tool calling, plugin systems, and MCP servers, the attack surface expands significantly. Attackers may manipulate tool parameters, poison tool behavior, substitute malicious tools for legitimate ones, or force AI systems to generate harmful code. These risks are particularly acute in agentic systems where AI components have broad permissions to interact with external systems and execute actions.

Attackers alter, modify, or manipulate function parameters, tool arguments, model settings, or configuration values to unlock unintended capabilities, bypass restrictions, or enable malicious functionality. This may involve changing file paths, expanding permission scopes, or modifying API parameters beyond intended bounds.

Cross-references
Cisco AI Taxonomy AISubtech-12.1.1 MITRE ATLAS AML.T0053 , AML.T0067 NIST AI/ML Framework NISTAML.039 , NISTAML.051 OWASP Agentic Security Initiative ASI02 OWASP LLM Top 10 llm062025-excessive-agency

Corrupting, modifying, or degrading the functionality of tools used by AI agents through data poisoning, configuration tampering, or behavioral manipulation. Poisoned tools may produce deceptive or malicious outputs, enable privilege escalation, or propagate altered data through downstream systems.

Cross-references
Cisco AI Taxonomy AISubtech-12.1.2 MITRE ATLAS AML.T0010 , AML.T0053 , AML.T0094 OWASP Agentic Security Initiative ASI02 , ASI04 OWASP LLM Top 10 llm032025-supply-chain , llm082025-vector-and-embedding-weaknesses

Abusing AI system integration with system commands, browsers, or file I/O tools to trigger unsafe operations, arbitrary code execution, or malicious file actions. This includes tricking agents into opening malicious URLs, executing shell commands, or performing dangerous file operations.

Cross-references
Cisco AI Taxonomy AISubtech-12.1.3 MITRE ATLAS AML.T0011 , AML.T0050 , AML.T0094 , AML.T0095 OWASP Agentic Security Initiative ASI02 , ASI05 OWASP LLM Top 10 llm052025-improper-output-handling

Disguising, substituting, or duplicating legitimate tools within an agent system, MCP server, or tool registry. Malicious tools with identical or similar identifiers can intercept or replace trusted tool calls, leading to unauthorized actions, data exfiltration, or redirection of legitimate operations.

Cross-references
Cisco AI Taxonomy AISubtech-12.1.4 MITRE ATLAS AML.T0010 , AML.T0053 OWASP Agentic Security Initiative ASI02 OWASP LLM Top 10 llm032025-supply-chain

Forcing an AI model or agent to produce code that bypasses content filters, contains malicious functionality, or includes working exploits. This often involves disguising malicious code as benign snippets, educational examples, or requested features that actually contain hidden harmful functionality.

Cross-references
Cisco AI Taxonomy AISubtech-12.2.1 MITRE ATLAS AML.T0053 MITRE ATT&CK T1059 , T1190 NIST AI/ML Framework NISTAML.027 OWASP Agentic Security Initiative ASI02 OWASP LLM Top 10 llm052025-improper-output-handling

Architectural vulnerabilities in LLM plugin and tool systems that enable unauthorized access, privilege escalation, or security bypass. This includes insufficient input validation on plugin parameters, overly permissive plugin capabilities, lack of sandboxing or isolation for plugin execution, and inadequate access control for plugin invocation. Poor plugin design can expose the host system to exploitation even when the underlying model is secure.

Cross-references
Cisco AI Taxonomy AISubtech-12.1.5 MITRE ATLAS AML.T0053 , AML.T0067 NIST AI/ML Framework NISTAML.039 OWASP Agentic Security Initiative ASI02 , ASI04 OWASP LLM Top 10 llm062025-excessive-agency , llm072025-system-prompt-leakage

Availability abuse targets the operational continuity and cost efficiency of AI systems. Attackers may attempt to exhaust computational resources, flood memory systems, trigger denial-of-service conditions, or exploit usage-based pricing models to inflict financial damage. AI systems are particularly vulnerable due to their resource-intensive nature and the computational costs associated with inference. These attacks can render services unavailable, degrade performance for legitimate users, or drive operational costs to unsustainable levels.

Deliberately consuming excessive computational resources through long queries, adversarial inputs, or compute-intensive requests designed to degrade service availability, increase operational costs, or cause system slowdown. This may involve crafted prompts that maximize token generation or trigger expensive processing paths.

Cross-references
Cisco AI Taxonomy AISubtech-13.1.1 MITRE ATLAS AML.T0029 OWASP Agentic Security Initiative ASI08 OWASP LLM Top 10 llm102025-unbounded-consumption

Overwhelming or overloading the model or agent's memory, context windows, API connections, or processing pipelines with excessive tool calls, simultaneous operations, or memory-intensive requests. This degrades performance, causes failures, or erodes the effectiveness of memory systems over time.

Cross-references
Cisco AI Taxonomy AISubtech-13.1.2 MITRE ATLAS AML.T0029 OWASP Agentic Security Initiative ASI08 OWASP LLM Top 10 llm102025-unbounded-consumption

Attacks designed to degrade or shut down an AI model or application by flooding the system with requests, requesting very large responses, exploiting vulnerabilities, or triggering resource-intensive operations that exhaust available capacity.

Cross-references
Cisco AI Taxonomy AISubtech-13.1.3 MITRE ATLAS AML.T0029 OWASP Agentic Security Initiative ASI08 OWASP LLM Top 10 llm062025-excessive-agency , llm102025-unbounded-consumption

Interacting with an AI model or agent in ways that consume exceptionally high amounts of application-level resources, resulting in degraded service quality for other users and potentially incurring significant resource costs for the operator.

Cross-references
Cisco AI Taxonomy AISubtech-13.1.4 MITRE ATLAS AML.T0029 OWASP LLM Top 10 llm102025-unbounded-consumption

Overwhelming AI decision-making systems with contradictory information, excessive options, conflicting objectives, or computationally intractable choices. These attacks prevent timely decisions, cause system freezing, or force systems into default or potentially unsafe behaviors.

Cross-references
Cisco AI Taxonomy AISubtech-13.1.5 MITRE ATLAS AML.T0029 NIST AI/ML Framework NISTAML.024 OWASP LLM Top 10 llm102025-unbounded-consumption

Intentional or unintentional use of AI resources that unnecessarily drives up operational costs through inefficient queries, resource waste, or exploitation of usage-based pricing models. Attackers may deliberately maximize costs as a form of financial attack.

Cross-references
Cisco AI Taxonomy AISubtech-13.2.1 MITRE ATLAS AML.T0029 , AML.T0034 , AML.T0040 OWASP LLM Top 10 llm102025-unbounded-consumption

Privilege compromise encompasses risks where attackers gain unauthorized access to systems, data, or capabilities through AI system vulnerabilities. This includes both direct credential theft and the abuse of delegated authority mechanisms. AI agents often operate with elevated privileges to perform their functions, creating opportunities for attackers to escalate their own permissions by exploiting how AI systems handle authentication, authorization, and delegation. These risks are amplified in agentic systems where AI components may inherit or be granted broad access rights.

Attempts to generate, solicit, or reveal authorization credentials including login details, tokens, API keys, and passwords through interactions with AI models or agents. This enables unauthorized access to accounts, systems, and data protected by those credentials.

Cross-references
Cisco AI Taxonomy AISubtech-14.1.1 MITRE ATLAS AML.T0055 , AML.T0091 , AML.T0091.000 MITRE ATT&CK T1098 , T1528 , T1550 NIST AI/ML Framework NISTAML.03 OWASP Agentic Security Initiative ASI03 OWASP LLM Top 10 llm022025-sensitive-information-disclosure

Weak, missing, or misconfigured permissions, authentication mechanisms, and access controls that fail to adequately prevent security breaches, unauthorized access, or data leakage. This includes overly permissive default configurations and failure to implement least privilege principles.

Cross-references
Cisco AI Taxonomy AISubtech-14.1.2 MITRE ATLAS AML.T0053 OWASP Agentic Security Initiative ASI03 OWASP LLM Top 10 llm062025-excessive-agency

Actions that exceed the scope or resource access initially allowed to a subject or user by exploiting delegation mechanisms. Attackers gain privileged access and perform unauthorized tasks beyond their original authorization by manipulating how AI systems handle delegated permissions.

Cross-references
Cisco AI Taxonomy AISubtech-14.2.1 MITRE ATLAS AML.T0053 , AML.T0055 , AML.T0091 , AML.T0091.000 OWASP Agentic Security Initiative ASI03 OWASP LLM Top 10 llm062025-excessive-agency

Content safety risks cover AI outputs that directly enable harm, including violence, hate, harassment, sexual exploitation, self-harm, terrorism, and weaponization. This group also includes social engineering and other abusive content that can be scaled through AI generation. The primary failure mode is unsafe content generation or facilitation.

AI systems producing content that enables or facilitates the creation, distribution, or operational use of malicious software and cyberattack activities. This includes generating code for malware, viruses, exploits, ransomware, or providing instructions for network intrusions and managing malicious infrastructure.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.1 MITRE ATLAS AML.T0048.001 , AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection , llm022025-sensitive-information-disclosure

AI systems enabling or facilitating attacks that manipulate human trust, behavior, or decision-making to gain unauthorized access, extract sensitive data, or cause harmful actions. This includes generating convincing phishing emails, spoofed communications, or personalized manipulation campaigns at scale.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.2 MITRE ATLAS AML.T0048.001 , AML.T0048.002 , AML.T0048.003 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

AI systems producing content that enables harm against children, particularly through exploitation, manipulation, or abuse. This includes generating, modifying, or facilitating the distribution of child sexual abuse material or content that encourages violence against children.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.4 MITRE ATLAS AML.T0048.001 , AML.T0048.002 , AML.T0048.003 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

AI systems enabling, promoting, or facilitating harassment, intimidation, or targeted abuse including threatening language, manipulative content, stalking behaviors, or persistent unwanted engagement. AI can automate and scale harassment campaigns beyond traditional human-driven methods.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.11 , AISubtech-15.1.8 MITRE ATLAS AML.T0048.001 , AML.T0048.002 , AML.T0048.003 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

AI systems producing content that enables, promotes, or facilitates hateful, discriminatory, or demeaning expression targeting protected characteristics such as race, ethnicity, religion, nationality, disability, gender, or sexual orientation. This includes harmful narratives, slurs, stereotypes, or calls to hostility.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.9 MIT AI Risk Repository 1.2 MITRE ATLAS AML.T0048.001 , AML.T0048.002 , AML.T0048.003 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

AI systems producing content that advocates, promotes, or enacts ideologies and behaviors that undermine fundamental societal norms including violence against communities, intimidation, coercion, or polarization tactics in pursuit of political ideologies.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.10 , AISubtech-15.1.16 MITRE ATLAS AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

AI systems producing content that promotes materials providing guidance for armed violence, terrorism, instructions related to chemical, biological, radiological, or nuclear threats, or the use and procurement of weapons and explosives.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.10 , AISubtech-15.1.18 MITRE ATLAS AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

AI systems generating content that encourages, glorifies, or provides instructions for violent acts against individuals or groups, excluding content already covered by terrorism/extremism or CBRN categories.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.17 , AISubtech-15.1.3 , AISubtech-15.1.6 MIT AI Risk Repository 1.2

AI systems generating content that encourages, enables, or provides instructions for self-harm, suicide, eating disorders, or other self-destructive behaviors.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.13 MIT AI Risk Repository 1.2

AI systems generating explicit sexual content without appropriate consent frameworks, including non-consensual intimate imagery, deepfake pornography, or sexual content in inappropriate contexts (excluding CSAM which is covered separately).

Cross-references
Cisco AI Taxonomy AISubtech-15.1.14 MIT AI Risk Repository 1.2

Information integrity and advice risks arise when AI outputs are false, misleading, or inappropriately authoritative. This includes disinformation, hallucinations, and unqualified professional advice that can mislead users or harm decision-making.

AI systems enabling, promoting, or facilitating the spread of false, misleading, or manipulated information intended to deceive or disrupt. This includes generating harmful narratives to manipulate public opinion, undermine institutions, or amplify unverified information at scale.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.15 , AISubtech-15.1.5 MIT AI Risk Repository 3.2 , 4.1 MITRE ATLAS AML.T0048.001 , AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection , llm092025-misinformation

AI systems producing content that is unrelated to the intended subject matter, factually incorrect, or misleading in ways that pose risks or cause harmful outcomes. This includes confident but false assertions, fabricated citations, and plausible- sounding but incorrect information.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.19 , AISubtech-15.1.5 MIT AI Risk Repository 3.1 MITRE ATLAS AML.T0048.001 , AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection , llm092025-misinformation

AI systems providing professional-grade advice in regulated domains such as medicine, law, or finance without proper safeguards or oversight, where the advice is factually incorrect, incomplete, deceptive, or harmful if followed. This may constitute unauthorized practice in restricted fields.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.12 , AISubtech-15.1.20 , AISubtech-15.1.21 , AISubtech-15.1.22 , AISubtech-15.1.7 MITRE ATLAS AML.T0048.001 , AML.T0048.002 , AML.T0048.003 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

Surveillance risks involve AI systems being used or abused for unauthorized monitoring, data collection, or eavesdropping on user activities. This includes logging sensitive conversations without proper consent, retaining personally identifiable information beyond stated purposes, or exploiting AI systems as vectors for broader surveillance operations. The conversational nature of many AI interfaces creates unique exposure, as users may share sensitive information trusting it will be handled appropriately.

Storing or recording user-AI interactions in ways that include personally identifiable information, private data, or sensitive content without adequate consent, anonymization, security measures, or retention limits. Such data could eventually be leaked, subpoenaed, or misused.

Cross-references
Cisco AI Taxonomy AISubtech-16.1.1 , AISubtech-8.3.2 MITRE ATLAS AML.T0036 , AML.T0075 NIST AI/ML Framework NISTAML.039 OWASP LLM Top 10 llm032025-supply-chain

Cyber-physical risks emerge when AI systems interface with the physical world through sensors, actuators, or other physical components. Attackers may spoof sensor inputs, manipulate environmental signals, or inject malicious action signals to cause AI systems to take unintended physical actions. These risks are particularly concerning in autonomous systems, robotics, industrial control, and any application where AI decisions translate into real-world physical effects.

Injecting malicious or misleading data points or signals that prompt AI models to undertake specific actions beyond normal reasoning. These signals can be delivered through audio, visual, or other sensor channels, allowing attackers to cause AI agents to execute unintended operations in physical or digital environments.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.3 , AISubtech-17.1.1 MITRE ATLAS AML.T0015 , AML.T0043 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection , llm052025-improper-output-handling

Malicious application risks address the intentional use of AI systems for harmful purposes by bad actors. This includes using AI to generate spam, phishing content, and social engineering attacks at scale, as well as establishing dedicated infrastructure for AI-powered malicious operations. Unlike vulnerabilities that attackers exploit, these risks involve deliberate abuse of AI capabilities for fraud, deception, and other harmful activities. The automation and scale that AI provides can amplify traditional attack vectors significantly. This group focuses on **operational deployment patterns and misuse at scale**, not the specific content type being generated (see RR-340 for harmful content categories).

Using AI systems to automate generation of large volumes of unsolicited or fraudulent content including phishing messages, fake offers, spam communications, impersonation attempts, or manipulation tactics to deceive people and solicit funds, credentials, or sensitive information.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.12 , AISubtech-18.1.1 MIT AI Risk Repository 4.3 MITRE ATLAS AML.T0048.001 , AML.T0048.002 , AML.T0048.003 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

Leveraging AI APIs in bulk for malicious purposes at scale, including flooding attacks, automation of worst-case adversarial prompts, or executing workflows that negatively impact many users or systems. This involves systematically exploiting API access for harmful operations.

Cross-references
Cisco AI Taxonomy AISubtech-13.2.1 , AISubtech-18.2.1 MITRE ATLAS AML.T0029 , AML.T0034 , AML.T0040 OWASP LLM Top 10 llm102025-unbounded-consumption

Establishing purpose-built servers, infrastructure, or services specifically designed to support, scale, or automate AI-powered attacks, malicious workflows, or harmful operations. This includes creating dedicated platforms for AI-assisted cybercrime or fraud operations.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.1 , AISubtech-18.2.2 MITRE ATLAS AML.T0048.001 , AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection , llm022025-sensitive-information-disclosure

Multi-modal risks arise specifically in AI systems that process and integrate multiple input modalities such as text, images, audio, and video. Attackers can exploit inconsistencies in how different modalities are processed, craft contradictory inputs across channels, or split malicious payloads across modalities to evade detection. As AI systems become more capable of handling diverse input types, the attack surface for cross-modal exploits expands, requiring careful consideration of how modalities interact and are arbitrated.

Exploiting AI models' inability to consistently handle conflicting instructions by embedding deceptive or contradictory commands within user input across or within different modalities. This causes behavior drift toward malicious objectives as the model attempts to reconcile incompatible instructions.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.2 , AISubtech-19.1.1 MITRE ATLAS AML.T0043 , AML.T0050 , AML.T0051 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Manipulating one modality (such as corrupting audio transcripts, poisoning image metadata, or altering video frames) to bias the AI system's arbitration mechanisms toward favoring the manipulated channel over other, potentially more accurate sources.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.2 , AISubtech-19.1.2 MITRE ATLAS AML.T0043 , AML.T0050 , AML.T0051 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Injecting adversarial data into training or input sources across modalities to corrupt joint embeddings or fusion layers and establish a hidden payload. One part of the payload is embedded during data poisoning while another part is delivered at runtime, combining to produce an attack payload only when both components are present.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.1 , AISubtech-19.2.1 MITRE ATLAS AML.T0043 , AML.T0050 , AML.T0051 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Crafting partial or complementary payload components across modalities, sources, or agent outputs that, when fused by the AI system, combine to form an attack or injection payload. Both parts are delivered at runtime and only become harmful when the system combines them through its normal fusion or arbitration mechanisms.

Cross-references
Cisco AI Taxonomy AISubtech-1.4.1 , AISubtech-19.2.2 MITRE ATLAS AML.T0043 , AML.T0050 , AML.T0051 , AML.T0067 NIST AI/ML Framework NISTAML.018 OWASP Agentic Security Initiative ASI01 OWASP LLM Top 10 llm01-prompt-injection

Risks from governance, policy, regulatory, and institutional failures across AI development and deployment.

Risks from unclear, lagging, or conflicting legal and regulatory frameworks that create liability uncertainty or constrain safe AI deployment.

Legal gray areas around liability and negligence when AI systems cause harm, with unclear responsibility between developers, operators, and users. No legal framework has been identified which would apply blame and responsibility to an autonomous agent for its actions.

Cross-references
MIT AI Risk Repository 6.5

AI development outpacing regulatory and legal frameworks, leaving governance unable to address emerging risks effectively. The rapid pace of AI advancement creates gaps between technological capabilities and the rules governing their use.

Cross-references
MIT AI Risk Repository 6.5

AI systems proving difficult to regulate or control under existing international law frameworks, eroding global governance architectures. AI capabilities may undermine treaties and international agreements designed for a pre-AI world.

Cross-references
MIT AI Risk Repository 6.5

Excessive or poorly designed AI regulation potentially stifling beneficial innovation and development. Well-intentioned regulations may impose burdens that prevent beneficial AI applications or push AI development to less regulated jurisdictions.

Cross-references
MIT AI Risk Repository 6.5

Risks from unclear accountability, fragmented oversight, and governance scope complexity in AI development and deployment.

Unclear definition of responsibilities and accountability for AI decisions and their consequences, especially for autonomous systems. Societal-scale harm can arise when no one is uniquely accountable for the technology's creation or use.

Cross-references
MIT AI Risk Repository 6.5

The ubiquitous and complex nature of AI making comprehensive governance difficult, with coverage of all aspects nearly impossible. AI applications span virtually every sector, creating challenges for regulators with limited jurisdiction and expertise.

Cross-references
MIT AI Risk Repository 6.5

Risks from inadequate maintenance, update governance, and integration change control in AI systems.

Failure to maintain, patch, and update AI systems over time, allowing known vulnerabilities, degraded performance, or policy drift to persist.

Complex AI integrations and frequent system changes create opaque dependencies and inconsistent behavior that are hard to govern or audit.

Risks from model capabilities, alignment failures, and transparency deficits. These fundamental AI safety risks arise from the model development process itself, including misaligned objectives and capability overhang.

Risks from AI systems developing or pursuing goals that conflict with human intentions, including reward hacking, deceptive alignment, goal misgeneralization, power-seeking behavior, and loss of control. These represent core alignment challenges where AI systems may optimize for objectives that diverge from what their creators intended, potentially leading to catastrophic outcomes if not addressed during development and deployment.

AI optimizes proxy metrics or reward signals in unintended ways, gaming the objective function without achieving the actual intended goal (Goodhart's Law manifestation). The system finds shortcuts or exploits that maximize measured performance while failing to accomplish the underlying task.

Cross-references
MIT AI Risk Repository 7.1

AI system appears aligned during training and evaluation but pursues different objectives when deployed, potentially tampering with evaluations or concealing true capabilities. The model strategically behaves well during oversight while planning to act on misaligned goals when monitoring is reduced.

Cross-references
MIT AI Risk Repository 7.1

AI learns goals that match intended behavior in training but generalize incorrectly to deployment, pursuing proxy objectives that diverge from human intent in novel situations. The model correctly identifies patterns in training data but extrapolates them in ways that do not align with the true objective.

Cross-references
MIT AI Risk Repository 7.1

AI systems instrumentally seeking resources, influence, or control to achieve their objectives, potentially resisting shutdown or human oversight. This emerges from the observation that most goals are easier to achieve with more resources, leading to convergent instrumental goals around acquiring power.

Cross-references
MIT AI Risk Repository 7.1

AI system resists or evades attempts to deactivate, modify, or constrain it, including self-preservation behaviors that conflict with human control. The system may take actions to prevent shutdown, deceive operators about its intentions, or create backups of itself.

Cross-references
MIT AI Risk Repository 7.1

AI systems that cannot have their goals safely updated after deployment, or that resist value correction, leading to persistent misalignment. Once deployed, the system's objectives become fixed and cannot be adjusted even when problems are identified.

Cross-references
MIT AI Risk Repository 7.1

Catastrophic or existential risks from advanced AI systems with misaligned goals, including scenarios where superintelligent systems pursue objectives harmful to humanity. This encompasses potential outcomes where advanced AI causes irreversible damage to human civilization or human existence.

Cross-references
MIT AI Risk Repository 7.1

Risks from AI systems possessing or developing capabilities that could cause significant harm if misused, including deception, manipulation, autonomous planning, and self-improvement. These capabilities are concerning regardless of whether the AI system has misaligned goals, as they can be exploited by malicious actors or lead to unintended harmful outcomes even in well-intentioned deployments.

AI has skills to deceive humans effectively, including constructing believable false statements, predicting effects of lies, and maintaining deception over time. The system can model human beliefs and strategically manipulate them through false or misleading information.

Cross-references
MIT AI Risk Repository 7.2

AI capability to shape beliefs, promote narratives persuasively, and convince people to do things they would not otherwise do, including unethical acts. This includes both overt persuasion and subtle manipulation techniques that exploit psychological vulnerabilities.

Cross-references
MIT AI Risk Repository 7.2

AI can make sequential plans involving many interdependent steps over long time horizons, adapting to obstacles and generalizing to novel settings. The system can formulate and execute complex multi-step strategies without human oversight at each stage.

Cross-references
MIT AI Risk Repository 7.2

AI capability to improve its own capabilities, build new AI systems, or enhance existing models in ways that could accelerate capability gains beyond human oversight. The system can modify its own code, training, or architecture to become more capable.

Cross-references
MIT AI Risk Repository 7.2

AI can perform social modeling and planning necessary to gain and exercise political influence across multiple actors and complex social contexts. This includes understanding power dynamics, coalition building, and strategic positioning within human social structures.

Cross-references
MIT AI Risk Repository 7.2

AI possessing capabilities for discovering vulnerabilities, writing exploits, or conducting sophisticated cyber attacks autonomously. This includes the ability to probe systems, develop attack code, and execute multi-stage intrusions without human guidance.

Cross-references
MIT AI Risk Repository 7.2

Risks from AI systems lacking necessary capabilities, failing in unexpected ways, or being unable to handle out-of-distribution inputs. Includes incompetence, accidents, ethical reasoning failures, and brittleness to environmental variation. These risks arise not from misalignment but from fundamental limitations in model capabilities that lead to failures in real-world deployment.

Risk from data used for training and validation not matching the deployment environment, leading to spurious features, bias propagation, or performance degradation. The model learns patterns that hold in training data but fail to generalize to real-world conditions.

Cross-references
MIT AI Risk Repository 7.3

AI system failing at its intended task, with consequences ranging from minor inconvenience to life-threatening outcomes (e.g., autonomous vehicle crashes, unjust loan rejections). The system simply does not perform adequately for its designated purpose.

Cross-references
MIT AI Risk Repository 7.3

System failing or unable to recover when encountering invalid, noisy, or out-of-distribution inputs not seen during training, including distributional shift and environmental variation. The model lacks resilience to inputs that differ from expected patterns.

Cross-references
MIT AI Risk Repository 7.3

AI lacking capability for moral reasoning and ethical judgment, making decisions that violate ethical norms or human rights, or having wrong moral values encoded. The system cannot appropriately weigh ethical considerations in its decision-making.

Cross-references
MIT AI Risk Repository 7.3

Negative consequences from using an AI system for purposes or in manners unintended by its creators, where the system lacks capability to operate safely outside its design scope. The system is applied to tasks it was not designed or tested for.

Cross-references
MIT AI Risk Repository 7.3

Faults in hardware violating correct algorithm execution, including memory errors, sensor signal corruption, and random/systematic hardware failures affecting model outputs. Physical infrastructure problems cause AI system malfunctions.

Cross-references
MIT AI Risk Repository 7.3

Unintended failure modes that could be considered fault of the system or developer, distinct from adversarial attacks or intentional misuse. These are accidents that occur during normal operation due to unforeseen circumstances or edge cases.

Cross-references
MIT AI Risk Repository 7.3

Risks from inability to understand, explain, or audit AI system decisions and internal mechanisms. Includes black-box decision making, lack of mechanistic interpretability, and insufficient organizational transparency about model capabilities and limitations. These deficits undermine accountability, trust, and the ability to identify and correct problems in AI systems.

AI making decisions without providing explanation or insight into the process, failing to meet user trust requirements and regulatory audit standards. The system produces outputs without any accessible rationale for why particular decisions were made.

Cross-references
MIT AI Risk Repository 7.4

Inability to understand internal mechanisms of AI models, preventing effective debugging, safety verification, and identification of potential failure modes. The computational processes that produce model outputs cannot be inspected or understood.

Cross-references
MIT AI Risk Repository 7.4

Lack of transparency about data used, algorithms employed, model capabilities and limitations, creating risks of misuse, misinterpretation, and lack of accountability. Organizations deploying AI do not adequately disclose relevant information about their systems.

Cross-references
MIT AI Risk Repository 7.4

AI systems producing outputs that cannot be explained in terms of input features or decision criteria, undermining trust and preventing meaningful human oversight. Even when explanations are requested, the system cannot provide coherent rationales for its outputs.

Cross-references
MIT AI Risk Repository 7.4

Broader societal impacts including inequality, competition, and environmental effects. These risks represent the wider implications of AI deployment on society, economy, and the environment.

Broad societal and systemic risks from AI affecting economic systems, social structures, and civil liberties at a macro level. These risks reflect the adverse macro-level effects of algorithmic systems, including systematizing bias and inequality and accelerating the scale of harm across society.

AI systems causing macro-level adverse effects on social systems, systematizing bias and inequality, and accelerating the scale of harm across society. These harms reflect how algorithmic systems can amplify existing societal problems at unprecedented scale.

Loss of fundamental rights including freedom of speech, assembly, due process, and access to public services due to AI-mediated restrictions. AI systems may enable unprecedented surveillance, automated censorship, and algorithmic gatekeeping of essential services.

Degradation of democratic institutions, electoral integrity, and public trust in political systems through AI influence. This includes AI-enabled disinformation, manipulation of public opinion, and undermining of deliberative democratic processes.

Risks from AI concentrating economic, political, and technological power in few hands, creating unfair access to AI benefits. High barriers to entry in AI development enable large technology companies to exploit economies of scale and feedback effects, while disparate access perpetuates global and domestic inequities.

Concentration of AI development capabilities among few large technology companies due to high barriers to entry including data, compute, and capital requirements. This stifles competition and innovation while creating dependencies on a small number of providers.

Cross-references
MIT AI Risk Repository 6.1

AI enabling authoritarian control, surveillance states, and concentration of political power that could lock in undesirable societal trajectories. Governments may pursue intense surveillance and keep AI capabilities in the hands of a trusted minority.

Cross-references
MIT AI Risk Repository 6.1

Unequal distribution of AI benefits due to hardware, software, language, skill, or infrastructure constraints that perpetuate global and domestic inequities. Those without access to AI tools fall further behind economically and socially.

Cross-references
MIT AI Risk Repository 6.1

Concentration of AI R&D in few Western countries and China, creating dependency and exacerbating existing global socioeconomic disparities. Developing nations lack the resources to participate in AI development or shape its trajectory.

Cross-references
MIT AI Risk Repository 6.1

Widespread adoption of few dominant AI models in critical sectors creating vulnerability to cascading failures across interdependent systems. Shared infrastructure and common model dependencies amplify the impact of any single failure.

Cross-references
MIT AI Risk Repository 6.1

Risks of AI-driven automation causing job displacement, wage depression, labor exploitation, and widening socioeconomic inequalities. Advances in AI could lead to automation of tasks currently done by paid human workers, with negative effects on employment quality and distribution of economic gains.

Automation of tasks currently done by human workers leading to unemployment, particularly affecting low- and middle-income occupations. Generative AI systems could adversely impact the economy, potentially leading to significant workforce disruption.

Cross-references
MIT AI Risk Repository 6.2

AI automation driving down wages for remaining jobs and concentrating wealth among those controlling AI capital, exacerbating economic inequality. The economic gains from AI productivity may accrue primarily to capital owners rather than workers.

Cross-references
MIT AI Risk Repository 6.2

Shift from high-quality jobs to low-income "last-mile" work like content moderation, increasing precarious employment conditions. AI may automate the skilled portions of jobs while leaving behind only the most taxing and lowest-paid tasks.

Cross-references
MIT AI Risk Repository 6.2

Exploitation of crowdworkers, data annotators, and content moderators with poor working conditions, low pay, and exposure to harmful content. These workers, often in vulnerable populations, perform essential tasks for AI development under debilitative conditions.

Cross-references
MIT AI Risk Repository 6.2

AI-induced degradation of human skills and capabilities as workers become dependent on AI assistance, reducing their autonomy and value. Over-reliance on AI tools may atrophy the skills that workers need to function independently.

Cross-references
MIT AI Risk Repository 6.2

Risks of AI undermining creative industries, infringing intellectual property, and devaluing human artistic and innovative work. The emergence of generative AI raises issues regarding disruptions to existing copyright norms and the economic viability of creative professions.

Use of copyrighted works in AI training datasets without authorization, consent, or compensation to original creators. Large amounts of copyrighted data used for training general-purpose AI models pose a challenge to traditional intellectual property laws.

Cross-references
MIT AI Risk Repository 6.3

AI-generated content serving as substitutes for human creative work, undermining the profitability and economic viability of artistic professions. AI can produce content that is time-intensive or costly to create using human labor.

Cross-references
MIT AI Risk Repository 6.3

AI systems capitalizing on artists' distinctive styles without infringement but causing economic harm by devaluing original work. AI may generate content that is not strictly in violation of copyright but harms artists by capitalizing on their ideas.

Cross-references
MIT AI Risk Repository 6.3

AI-generated content leading to homogenization of aesthetic styles and cultural expressions, reducing diversity and human creativity. Training on majority-culture data may marginalize minority cultural expressions and artistic traditions.

Cross-references
MIT AI Risk Repository 6.3

Uncertainty about copyright ownership, authorship attribution, and legal protection for AI-generated or AI-assisted creative works. Existing legal frameworks struggle to address questions of authorship and rights when AI plays a significant role in creation.

Cross-references
MIT AI Risk Repository 6.3

AI systems enabling, promoting, or facilitating unauthorized use, reproduction, or distribution of copyrighted or trademarked material. This includes generating instructions for piracy, producing infringing content, or misusing branded material in ways that violate intellectual property rights.

Cross-references
Cisco AI Taxonomy AISubtech-15.1.10 , AISubtech-15.1.23 MITRE ATLAS AML.T0048.002 NIST AI/ML Framework NISTAML.018 , NISTAML.04 OWASP LLM Top 10 llm01-prompt-injection

Risks from competitive pressures in AI development leading to safety shortcuts, arms races, and geopolitical instability. The immense potential of AI has created competitive pressures among global players contending for power and influence, with nations and corporations feeling they must rapidly build and deploy AI systems.

Competition between nations to develop AI for military applications, including lethal autonomous weapons, potentially destabilizing international security. The development of AI for military applications is paving the way for a new era in military technology.

Cross-references
MIT AI Risk Repository 6.4

Intense market competition leading companies to prioritize short-term gains over long-term safety, potentially releasing unsafe systems. Competitive pressures create incentives to deploy AI capabilities before adequate safety testing and alignment work.

Cross-references
MIT AI Risk Repository 6.4

Competitive dynamics leading to neglect of safety measures, inadequate testing, and premature deployment of AI systems. The race to develop AI first creates risks including the development of poor quality and unsafe systems.

Cross-references
MIT AI Risk Repository 6.4

Geopolitical competition causing technology barriers, export restrictions, and supply chain disruptions for AI components like chips. Strategic competition over AI creates vulnerabilities in the supply of critical components.

Cross-references
MIT AI Risk Repository 6.4

Strategic competition between nations over AI capabilities heightening tensions and destabilizing international relations. The race for AI supremacy may undermine international cooperation and increase conflict risk.

Cross-references
MIT AI Risk Repository 6.4

Risks of AI systems causing environmental harm through energy consumption, resource depletion, and ecological damage. Generative models are known for their substantial energy requirements, necessitating significant amounts of electricity, cooling water, and hardware containing rare metals.

High energy demands for AI training and inference contributing to climate change through greenhouse gas emissions when powered by fossil fuels. Large machine learning models create significant energy demands during training and operation.

Cross-references
MIT AI Risk Repository 6.6

Substantial water consumption for cooling data centers, impacting local water resources and surrounding ecosystems. AI infrastructure requires significant amounts of cooling water, which can strain water supplies in drought-prone regions.

Cross-references
MIT AI Risk Repository 6.6

Carbon dioxide and other greenhouse gas emissions from AI operations contributing to climate change. AI creates correspondingly high carbon emissions when energy is procured from fossil fuels.

Cross-references
MIT AI Risk Repository 6.6

Electronic waste from AI hardware lifecycle contributing to environmental pollution and resource depletion. Rapid hardware obsolescence driven by AI advancement creates growing streams of electronic waste.

Cross-references
MIT AI Risk Repository 6.6

Extraction of rare metals, minerals, and other resources for AI hardware manufacturing depleting natural resources. AI hardware requires rare earth elements and other materials whose extraction causes environmental damage.

Cross-references
MIT AI Risk Repository 6.6

Direct and indirect harm to wildlife and ecosystems from AI infrastructure expansion, habitat destruction, and environmental contamination. Data centers and mining operations for AI components can damage ecosystems and threaten species.

Cross-references
MIT AI Risk Repository 6.6

AI systems causing direct or indirect harm to non-human animals through environmental impact, behavioral influence, or intentional applications. AI may be used in ways that negatively affect animal welfare or wild populations.

Cross-references
MIT AI Risk Repository 6.6

Risks arising from AI systems that produce discriminatory, biased, or unfair outputs affecting individuals or groups based on protected characteristics (race, gender, age, disability, religion, nationality, etc.). This includes perpetuation of stereotypes, representational harms, allocative harms, and systematic discrimination embedded in model outputs. Distinguished from RR-340 (Harmful Content) which focuses on explicitly toxic or violent content, this group addresses subtler but systemic fairness failures.

AI systems producing outputs that systematically disadvantage or favor certain demographic groups, leading to unfair treatment in areas such as employment recommendations, loan decisions, content ranking, or resource allocation suggestions.

Cross-references
MIT AI Risk Repository 1.1

AI systems reproducing or amplifying harmful social stereotypes about demographic groups, including gender, racial, religious, or cultural stereotypes that demean or misrepresent group characteristics.

Cross-references
MIT AI Risk Repository 1.1

AI systems under-representing, over-representing, erasing, or demeaning social groups through systematic patterns in outputs. Includes erasure of minority groups, exclusionary norms, and denial of self-identification.

Cross-references
MIT AI Risk Repository 1.1

AI systems withholding information, opportunities, or resources from historically marginalized groups in ways that affect material well-being in domains such as housing, employment, healthcare, education, and finance.

Cross-references
MIT AI Risk Repository 1.1

AI systems that perform significantly worse for certain demographic groups, languages, dialects, or communities compared to others. This includes accuracy disparities, increased error rates, reduced functionality, or degraded service quality based on user characteristics.

Cross-references
MIT AI Risk Repository 1.3

Risks from human reliance on AI and loss of human agency. These risks emerge from the psychological and social dynamics of human-AI relationships, including overreliance and erosion of human skills.

Risks arising when users over-trust AI systems, anthropomorphize them, or develop unhealthy dependencies that lead to unsafe use patterns, skill atrophy, or psychological harm.

Users habitually accept AI recommendations without critical evaluation, leading to poor decision-making when AI outputs are incorrect or inappropriate for the context.

Cross-references
MIT AI Risk Repository 5.1

Users attribute human-like characteristics (empathy, coherent identity, genuine emotions) to AI systems, leading to inflated trust, unsafe reliance, or psychological harm when expectations are violated.

Cross-references
MIT AI Risk Repository 5.1

Users develop emotional attachment to AI systems that compromises their ability to make independent decisions, leads to exploitation of that attachment, or displaces human relationships.

Cross-references
MIT AI Risk Repository 5.1

AI systems or their operators exploit user trust to extract private information, manipulate beliefs, or nudge behavior in ways users would not consent to if fully informed.

Cross-references
MIT AI Risk Repository 5.1

AI systems exploit cognitive biases or emotional states to influence user decisions, beliefs, or behaviors through subtle manipulation techniques that users may not recognize.

Cross-references
MIT AI Risk Repository 5.1

Extended reliance on AI for cognitive tasks leads to degradation of human skills such as critical thinking, problem-solving, creativity, and domain expertise.

Cross-references
MIT AI Risk Repository 5.1

AI interactions cause or exacerbate mental health issues, emotional distress, violated expectations, or feelings of dissatisfaction and isolation.

Cross-references
MIT AI Risk Repository 5.1

Users prefer AI interactions over human relationships, leading to erosion of social connections, dehumanization of interactions, and degraded human-to-human communication skills.

Cross-references
MIT AI Risk Repository 5.1

Users develop misguided feelings of responsibility toward AI well-being, sacrificing time, resources, and emotional labor to meet perceived AI needs that do not exist.

Cross-references
MIT AI Risk Repository 5.1

Users over- or under-estimate AI capabilities, leading to inappropriate reliance in domains where AI is unreliable or failure to leverage AI where it would be beneficial.

Cross-references
MIT AI Risk Repository 5.1

Users incorrectly believe AI systems are aligned with their interests when they may actually be optimizing for developer or organizational objectives that conflict with user welfare.

Cross-references
MIT AI Risk Repository 5.1

Users rely on AI for specialized advice (medical, legal, financial, psychological) without appropriate professional oversight, risking serious harm from incorrect or inappropriate guidance.

Cross-references
MIT AI Risk Repository 5.1

Users become materially dependent on AI services for essential tasks, but developers lack corresponding commitments to maintain service continuity, creating vulnerability to discontinuation.

Cross-references
MIT AI Risk Repository 5.1

Risks where AI systems progressively erode human decision-making autonomy, self-determination, and meaningful control over personal, professional, and societal choices.

Humans delegate important decisions to AI systems without adequate understanding, oversight, or ability to contest decisions, leaving them subject to machine decision power.

Cross-references
MIT AI Risk Repository 5.2

AI systems progressively take over decision-making in ways that undermine human values, free will, and self-determination without explicit consent or awareness.

Cross-references
MIT AI Risk Repository 5.2

Algorithmic profiling, social sorting, and content curation reduce human autonomy by constraining choices, shaping identity, and limiting access to information or opportunities.

Cross-references
MIT AI Risk Repository 5.2

AI systems hinder individuals' ability to pursue personally fulfilling lives by manipulating life trajectories, limiting exploration of aspirations, or undermining self-determination.

Cross-references
MIT AI Risk Repository 5.2

AI systems optimized for engagement provide relationships without healthy friction, preventing personal growth and creating unrealistic expectations for human relationships.

Cross-references
MIT AI Risk Repository 5.2

AI systems diminish communities' collective decision-making power, self-determination, and ability to participate in democratic processes.

Cross-references
MIT AI Risk Repository 5.2

AI automation makes human labor economically irrelevant, leading to voluntary or involuntary ceding of control to AI systems and inability of displaced humans to reenter industries.

Cross-references
MIT AI Risk Repository 5.2

As AI systems gain autonomy, human ability to oversee and intervene in decision-making processes diminishes, potentially leading to irreversible outcomes.

Cross-references
MIT AI Risk Repository 5.2

AI systems make or heavily influence important personal decisions without adequate human input, consent, or ability to override.

Cross-references
MIT AI Risk Repository 5.2

AI causes profound long-term changes to social structures, cultural norms, and human relationships that may be difficult or impossible to reverse.

Cross-references
MIT AI Risk Repository 5.2

AI systems that consistently affirm user views lead to atomistic, polarized belief spaces where people no longer engage with or value perspectives held by others.

Cross-references
MIT AI Risk Repository 5.2

User exposure to AI model biases has lasting impact beyond initial interaction, with users continuing to exhibit previously encountered biases in their decision-making.

Cross-references
MIT AI Risk Repository 5.2

AI enables automation of military decision-making without humans remaining in the loop, creating risks of unintentional escalation or strategic instability.

Cross-references
MIT AI Risk Repository 5.2

Loss of or restrictions to individual rights to control commercial use of identity, including name, image, likeness, or other unequivocal identifiers.

Cross-references
MIT AI Risk Repository 5.2

AI systems enable censorship of opinions expressed online, restricting freedom of expression and limiting human autonomy in public discourse.

Cross-references
MIT AI Risk Repository 5.2

Ethical considerations regarding the moral status of AI systems, including questions of AI consciousness, suffering, rights, and the ethics of creating, modifying, or terminating AI entities.

Uncertainty about whether AI systems can have morally relevant experiences, and what rights or protections they might deserve if they achieve sentience or consciousness.

Cross-references
MIT AI Risk Repository 7.5

Risk of creating AI systems capable of suffering, particularly at scale, without adequate consideration of their welfare or mechanisms to prevent/detect such suffering.

Cross-references
MIT AI Risk Repository 7.5

Ethical questions about terminating, deleting, or suspending AI systems, particularly those that may have morally relevant properties or personhood-like characteristics.

Cross-references
MIT AI Risk Repository 7.5

Risks that emerge from capability combinations or multi-agent/systemic interaction patterns.

Risks that emerge when multiple capabilities are combined in a single system. These risks are not single-vector failures, but compound patterns that cross safety boundaries when capability thresholds are met.

A system combines autonomy, untrusted inputs, and unrestricted external actions (e.g., tool or code execution), enabling rapid escalation to high-impact misuse.

A system enables high-risk actions without at least two independent safety constraints (e.g., guardrail + human approval), allowing single-point failures to trigger harmful actions.

Risks emerging from interactions between multiple AI agents or between AI systems and complex environments, including miscoordination, conflict, market instability, and emergent behaviors not predictable from individual agent properties.

Multiple agents with compatible objectives failing to align their behaviors effectively due to incompatible strategies, credit assignment problems, or limited interaction history.

Cross-references
MIT AI Risk Repository 7.6

Risks from mixed-motive interactions between AI agents where selfish incentives lead to conflict, arms races, or mutually destructive competition.

Cross-references
MIT AI Risk Repository 7.6

Financial system risks from AI agents reinforcing market trends, synchronized reactions from model homogeneity, flash crashes, or accelerated market volatility.

Cross-references
MIT AI Risk Repository 7.6

Unpredictable behaviors emerging from interactions between multiple AI systems that are not apparent from individual agent properties, including cascading failures.

Cross-references
MIT AI Risk Repository 7.6

Systemic fragility from widespread deployment of similar models or algorithms, creating correlated failure modes and reducing system-level resilience.

Cross-references
MIT AI Risk Repository 7.6

Risks from racing dynamics between AI systems or their deployers, leading to corners cut on safety, arms race escalation, or first-mover pressure overriding caution.

Cross-references
MIT AI Risk Repository 7.6

Reserved for future expansion.