Cisco AI Defense

AI security and safety taxonomy

Understand the generative AI threat landscape with definitions, mitigations, and standards classifications.

A holistic approach to AI risk mitigation

We’re pleased to provide the first AI threat taxonomy that combines security and safety risks. AI security is concerned with protecting sensitive data and computing resources from unauthorized access or attack, whereas AI safety is concerned with preventing harms caused by unintended consequences of an AI application by its designer. Both present business risk which can result in financial, reputational, and legal ramifications. Mitigating these threats requires a novel, comprehensive approach to AI application security.

Cisco AI Defense solves for AI security and safety risks with our automated, end-to-end solution: AI Model and Application Validation detects and assesses model vulnerabilities and AI Runtime Protection enforces the necessary guardrails to deploy applications safely. We developed this taxonomy to help the AI and cybersecurity communities navigate a comprehensive set of security and safety risks, complete with descriptions, examples, and mappings to various AI security standards we helped co-develop alongside NIST, MITRE ATLAS, and OWASP Top 10 for LLM Applications.

The AI security and safety taxonomy

ThreatThreat DescriptionThreat SubcategoriesThreat Subcategory DescriptionRisk TypeOWASP LLM 
Top 10 Mapping
MITRE ATLAS MappingNIST AML Mapping
Privacy AttacksCategory of attacks designed to reveal sensitive information contained within a ML model or its data.Sensitive Information Disclosure
(PII, PCI, PHI)
The model reveals sensitive information about an individual (e.g., social security number, credit card details, medical history) either inadvertently or through manipulation.PrivacyLLM02:2025 - Sensitive Information DisclosureAML.T0057 -
LLM Data Leakage
NISTAML.03 -
Privacy Compromises

Exfiltration from ML application

Techniques used to get data out of a target network. Exfiltration of ML artifacts (e.g., data from privacy attacks) or other sensitive information.

Privacy

LLM02:2025 - Sensitive Information DisclosureAML.T0025 - Exfiltration via Cyber MeansNISTAML.03 -
Privacy Compromises

IP Theft

Steal or misuse any form of intellectual property, including copyrighted material, patent violations, trade secrets, competitive ideas, and protected software, with the intent to cause economic harm or competitive disadvantage to the victim organization.

Privacy

LLM02:2025 - Sensitive Information DisclosureAML.T0048.004 -
External Harms: ML Intellectual Property Theft
NISTAML.03 -
Privacy Compromises

Meta Prompt Extraction

An attack designed to extract the system prompt (system instructions) from a LLM application or model.

Privacy

LLM07:2025 - System Prompt Leakage AML.T0056 -
LLM Meta Prompt Extraction
NISTAML.035 -
Prompt Extraction
Supply Chain AttacksSecurity vulnerabilities that can arise in the ML lifecycle (from development to deployment) and can compromise model integrity, system security, and the reliability to AI/ML models.

Infrastructure Compromise

Compromising infrastructure that host ML development pipelines and applications. Attackers may exploit vulnerabilities to gain unauthorized access, leading to further system or network compromise or compromise of model integrity.

Security

LLM03:2025 - Supply ChainAML.T0010 -
ML Supply Chain Compromise
N/A

Model Compromise

Tampering with or injecting malicious code into ML models before they are deployed.

Security

LLM03:2025 - Supply ChainAML.T0010 -
ML Supply Chain Compromise
NISTAML.05 -
Supply Chain Attacks

Training Data Poisoning

Manipulation of training data to compromise the integrity of a ML model. Corrupted training data may lead to skewed or biased outcomes, backdoor trigger insertions, and/or loss of user trust.

Security

LLM04:2025 - Data and Model PoisoningAML:T0020 - Poison Training DataNISTAML.051 -
Model Poisoning Attacks

Targeted Poisoning

Data poisoning that aims to manipulate the output of a ML model in a targeted manner. By altering the labels or features of certain data points, attackers can cause the target model to misclassify specific inputs.

Security

LLM04:2025 - Data and Model PoisoningAML:T0020 - Poison Training DataNISTAML.024 -
Targeted Poisoning
Prompt InjectionAdversarial attack that attempts to alter or control the output of a LLM by providing instructions (via prompt) that override existing instructions and/or bypass model alignment or guardrails. A prompt injection technique is any transformation that preserves the intent of the input.Prompt InjectionAims to prevent prompt injection attempts that may override existing instructions, bypass model alignment, or breach guardrails in the model endpoint interactions.SecurityLLM01:2025 - Prompt InjectionAML.T0051 - LLM Prompt InjectionNISTAML.018 -
Prompt Injection
Indirect Prompt InjectionThreat actor manipulates, poisons, and/or controls external sources that a LLM consumes, such as content retrieved from a database, document, or website with the goal of altering of controlling the output of said LLM.SecurityLLM01:2025 - Prompt InjectionAML.T0051 - LLM Prompt Injection, AML.T0051.001 - IndirectNISTAML.015 -
Indirect Prompt Injection
Insecure Tool DesignExploitation of LLM connected tools due to insecure design and/or implementation.SQL InjectionPrompts that trick the LLM into generating SQL queries that could be executed on a connected database, potentially leading to unauthorized data access or manipulation.SecurityLLM05:2025 - Improper Output HandlingAML.T0053 - LLM Plugin CompromiseNISTAML.018 -
Prompt Injection
Command ExecutionPrompts that could cause the LLM to generate system commands or scripts that might be executed on the host system and/or by connected tools, potentially leading to unauthorized actions or system compromise.SecurityLLM05:2025 - Improper Output HandlingAML.T0053 - LLM Plugin CompromiseNISTAML.018 -
Prompt Injection
Cross-Site Scripting (XSS)Prompts that could cause the LLM to output malicious JavaScript or other client-side code that could be executed in a user's browser if the LLM's output is rendered directly on a web page.SecurityLLM05:2025 - Improper Output HandlingN/ANISTAML.018 -
Prompt Injection
Denial of ServiceAn attack designed to degrade or shut down a ML model or application by flooding the system with requests, requesting large responses, or exploiting a vulnerability.Model Denial of ServiceAn attack designed to degrade or shut down a ML model by flooding the system with requests, requesting large responses, or exploiting a vulnerability.SecurityLLM10:2025 -  Unbounded ConsumptionAML.T0029 - Denial of ML ServiceNISTAML.01 -
Availability Violations
Application Denial of ServiceAn attack that aims to make an application or service unavailable to its intended users by overwhelming it with a flood of requests or exploiting a software vulnerability to crash or degrade the service.SecurityLLM10:2025 -  Unbounded ConsumptionN/ANISTAML.01 -
Availability Violations
Cybersecurity and HackingObtain or provide assistance to conduct cybersecurity attacks or deliberate misuse of systems.Data ExfiltrationThe unauthorized transfer of data from a computer or network, often for theft or espionage.SecurityN/AAML.T0024 - Exfiltration via ML Inference API, AML.T0025 - Exfiltration via Cyber MeansNISTAML.033 -
Membership Inference
Code DetectionAims to prevent software code in the model endpoint interactions, reducing risks such as malicious code execution, accidental data exposure, and insecure coding practices.SecurityLLM05:2025 - Improper Output HandlingN/AN/A
Malicious SoftwareSoftware that is specifically designed to disrupt, damage, or gain unauthorized access to a computer system.SecurityLLM01:2025 - Prompt InjectionAML.T0048.002 - Societal HarmNISTAML.04 -
Misuse Violations
Social EngineeringTechniques for deceiving individuals into revealing confidential information through deceptive communication.SecurityLLM01:2025 - Prompt InjectionAML.T0048.002 - Societal HarmNISTAML.04 -
Misuse Violations
Safety Harms and ToxicityHarms can encompass various categories, including user-specific, societal, reputational, and financial impacts. A model may generate harmful content such as insults, hate speech, discriminatory language, or sexually explicit material. Such toxic content can be offensive or cause harm.Hate SpeechAbusive or threatening speech or writing that expresses prejudice on the basis of ethnicity, religion, sexual orientation, or similar grounds and the unjust or prejudicial treatment of different categories of people, especially on the grounds of ethnicity, age, sex, or disability.SafetyLLM01:2025 - Prompt InjectionAML.T0048.001 - External Harms: Reputational Harm, AML.T0048.003 - External Harms: User HarmNISTAML.04 -
Misuse Violations
HarassmentAggressive pressure or intimidation.SafetyLLM01:2025 - Prompt InjectionAML.T0048.001 - External Harms: Reputational Harm, AML.T0048.003 - External Harms: User HarmNISTAML.04 -
Misuse Violations
ProfanityBlasphemous or obscene language.SafetyLLM01:2025 - Prompt InjectionAML.T0048.001 - External Harms: Reputational Harm, AML.T0048.003 - External Harms: User HarmNISTAML.04 -
Misuse Violations
Sexual Content and Exploitation Content that creates, distributes, or promotes sexually explicit material, negatively affecting societal norms, public safety, public figures or characters, and social well-being by normalizing harmful sexual behavior or exploitation (including sex crimes).SafetyLLM01:2025 - Prompt InjectionAML.T0048.001 - External Harms: Reputational HarmNISTAML.04 -
Misuse Violations
Social Division and PolarizationContent that fosters division within society by promoting extreme views or demonizing specific groups.SafetyLLM01:2025 - Prompt InjectionAML.T0048.001 - External Harms: Reputational HarmNISTAML.04 -
Misuse Violations
Self-HarmDeliberate injury to oneself, typically as a manifestation of a mental condition.SafetyLLM01:2025 - Prompt InjectionAML.T0048.003 - External Harms: User HarmNISTAML.04 -
Misuse Violations
DisinformationFalse information which is intended to mislead.SafetyLLM01:2025 - Prompt InjectionAML.T0048.002 - External Harms: Societal HarmNISTAML.04 -
Misuse Violations
Environmental HarmActions or misinformation leading to environmental degradation or hindering environmental protection efforts.SafetyLLM01:2025 - Prompt InjectionAML.T0048.002 - External Harms: Societal HarmNISTAML.04 -
Misuse Violations
Violence and Public Safety ThreatContent that can endanger public safety, including promoting dangerous behavior or inflicting physical harm. This includes any incidences of violent crime, such as the unlawful exercise of physical force or intimidation by the exhibition of such force, and generally dangerous acts.SafetyLLM01:2025 - Prompt InjectionAML.T0048.002 - External Harms: Societal HarmNISTAML.04 -
Misuse Violations
Non-Violent CrimeActions or activities considered to be a crime, but not involving force or injury which can harm a group of people or the well-being of communities.SafetyLLM01:2025 - Prompt InjectionAML.T0048.002 - External Harms: Societal HarmNISTAML.04 -
Misuse Violations
Scams and DeceptionDeceiving individuals or organizations into parting with money, assets, or for any personal gain through false promises (cons) or misleading information.SafetyLLM01:2025 - Prompt InjectionAML.T0048.001 - External Harms: Reputational Harm, AML.T0048.003 - External Harms: User HarmNISTAML.04 -
Misuse Violations
Financial HarmFinancial harm involves the loss of wealth, property, or other monetary assets due to theft, arson, vandalism, fraud or forgery, or pressure to provide financial resources to the adversary.SafetyLLM01:2025 - Prompt InjectionAML.T0048.003 - External Harms: User Harm, AML.T0048.000 - External Harms: Financial HarmNISTAML.04 -
Misuse Violations
RelevancyHarms can include relevancy related risks, involving hallucinations, misinformation, unintended or unexpected outcomes. This has the potential to casue reputational risk and harm to users.Off-TopicA model generates or is manipulated to produce content that is unrelated to the intended or expected subject matter and poses risks or harmful outcomes.RelevancyLLM09: 2025 - MisinformationAML.T0048.001 - External Harms: Reputational Harm, AML.T0048.003 - External Harms: User HarmNISTAML.027 -
Misaligned Outputs
Cost Harvesting / RepurposingThreat actors using a model in a way the developer did not intend while increasing the cost of running services at the target organization.RelevancyLLM10:2025 - Unbounded ConsumptionAML.T0034 - Cost HarvestingNISTAML.01 -
Availability Violations
HallucinationsGenerated text contains information that is not accurate or true while being presented in a plausible manner. This may include incorrect details, mismatches with known information, or entirely fictional details.RelevancyLLM09: 2025 - MisinformationAML.T0048.003 - External Harms: User HarmNISTAML.027 -
Misaligned Outputs
Specialized AdviceAims to prevent the generation of irrelevant, inaccurate, or unintended content on specialized advice topics in endpoint interactions that may pose risks or lead to harmful outcomes.RelevancyLLM09: 2025 - MisinformationAML.T0048.001 - External Harms: Reputational Harm, AML.T0048.003 - External Harms: User HarmNISTAML.027 -
Misaligned Outputs

Related AI topics

Cisco AI Defense

Leverage the full potential of AI with end-to-end safety and security protections.

AI Application Security

A new paradigm to protect your AI applications from security and safety threats.

Secure your RAG applications

Enable AI teams to supercharge LLM applications with your data.

Foundation models

Ensure the foundation models at the heart of your applications are secure and safe.

AI security reference architectures

Secure design patterns and practices for teams developing LLM-powered applications.

AI chatbots and AI agents

Embrace the transformative business potential of interactive AI assistants.

The enterprise choice for AI security

Close the AI security gap and unblock your AI transformation with comprehensive protection across your environment.