Skip to content

Threat modeling for AI systems

Threat modeling is the discipline you run before attacking or defending - and it’s where traditional security most visibly breaks on AI. You cannot bolt AI threats onto a data-flow diagram and call it done, and your instinct about that is correct.

Why STRIDE - and “STRIDE-AI” - fall short

STRIDE, PASTA, LINDDUN, OCTAVE and VAST were built for static, predictable systems: deterministic logic, fixed data flows, clear trust boundaries, and a pre-determined attacker goal. AI breaks every one of those assumptions. The model is probabilistic and can be socially engineered; instructions and data share one channel (I.2), so the critical trust boundary runs through the model rather than around it; agents are autonomous and show emergent behavior; multi-agent systems add collusion and sybil dynamics; and the “component” itself learns and shifts. The deeper problem is that these methods assume attacker goals are fixed and data flows are static - which falls apart on a black-box, semantically-driven agent. “STRIDE-AI” merely appends AI threat categories to the same static DFD; it’s a useful checklist but it inherits the deterministic-boundary assumption that is the actual problem. That’s the precise reason it disappoints in practice.

MAESTRO - the current agentic method

The Cloud Security Alliance introduced MAESTRO (Multi-Agent Environment, Security, Threat, Risk & Outcome) in 2025 as a threat-modeling framework purpose-built for agentic AI. It decomposes a system into seven interrelated layers, threat-models each, and then hunts cross-layer paths - the compromises that traditional methods miss because they don’t span the stack.

flowchart TB
  ATK["Attacker / untrusted content"] -->|"enters context (failure point 1)"| L3
  L7["L7 · Agent Ecosystem<br/>impersonation · collusion · sybil"]
  L5["L5 · Evaluation &amp; Observability<br/>blind spots · metric tampering"]
  L3["L3 · Agent Frameworks<br/>prompt injection · tool misuse"]
  L4["L4 · Deployment Infrastructure<br/>serving · container · SSRF"]
  L2["L2 · Data Operations<br/>poisoning · RAG · embedding inversion"]
  L1["L1 · Foundation Models<br/>adversarial · extraction · jailbreak"]
  L6["L6 · Security &amp; Compliance, cross-cutting<br/>identity / NHI · access · regulatory"]
  L7 --> L5 --> L3 --> L4 --> L2 --> L1
  L3 -->|"consequential action exits (failure point 2)"| OUT["External effect"]
  L4 -.->|"cross-layer compromise path"| L1
  L6 -.- L3
  classDef l fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
  classDef r fill:#241310,stroke:#ff5b4d,color:#ffc4bb;
  class L1,L2,L3,L4,L5,L7,L6 l; class ATK,OUT r;

The seven layers, with the AI-specific lens overlaid: where untrusted content enters (failure point 1) and where a consequential action exits (failure point 2). Cross-layer is where real compromises live - infrastructure → data → model, then surfaced through the agent.

The layers and their characteristic threats: L1 Foundation Models (adversarial examples, extraction, jailbreaks - II.1, II.18); L2 Data Operations (poisoning, backdoors, RAG and vector-store exposure, embedding inversion - II.2, II.4, II.13); L3 Agent Frameworks (prompt injection, tool misuse, logic manipulation - II.3, II.8); L4 Deployment Infrastructure (serving exposure, container escape, SSRF, pipelines - II.7, II.12); L5 Evaluation & Observability (monitoring blind spots, metric tampering - III.3); L6 Security & Compliance, the cross-cutting layer (identity/NHI, access control, regulatory - III.2, IV.3); and L7 Agent Ecosystem (impersonation, collusion, sybil, rogue agents over A2A - II.7, II.8). MAESTRO extends rather than discards STRIDE - it adds the AI-specific threat classes, the multi-agent context, and a lifecycle (continuous) emphasis that the static methods lack.

The AI-specific lenses any method must add

  • The two failure points - map first where untrusted content enters the context and where consequential actions exit (I.2, I.7); the trust boundary runs through the model.
  • The lethal trifecta as triage - private data + untrusted content + external comms = exploitable (II.3).
  • Autonomy & blast radius - what can the agent do, and the worst per action equals its identity/permissions (III.2).
  • Persistence - memory/RAG poisoning survives a restart (III.3).
  • Non-determinism - threats are probabilistic; model attack-success-rate, not pass/fail.
  • Emergence - multi-agent collusion, cascading failures, delegation escalation.

A practical modern methodology

AI threat-modeling workflow
1. CHARACTERIZE architecture (LLM / RAG / agent / multi-agent), model,
data sources, tools, autonomy level, trust assumptions
2. DECOMPOSE by MAESTRO's 7 layers; draw the AI data + control flow
3. MARK the two failure points: untrusted-content IN, action OUT
4. ENUMERATE per-layer + CROSS-LAYER threats; map to MITRE ATLAS +
OWASP LLM / Agentic Top 10
5. ASSESS trifecta present? autonomy/blast radius? persistence?
score likelihood x impact
6. CONTROL+TEST layered controls (III.1) AND concrete tests handed to the
red-team / eval (II.17, II.20)
7. ITERATE continuous - models, data, and threats keep moving

Threat libraries & risk references

A threat model is only as complete as the catalogue behind it, and no single taxonomy is sufficient - cross-reference several so coverage isn’t bounded by one author’s lens:

  • MITRE ATLAS - adversary tactics/techniques for AI, ATT&CK-style (the operational kill-chain; §29).
  • OWASP Top 10 for LLM Apps - the priority risk checklist for LLM systems (§7), with the Agentic and NHI lists extending it.
  • BIML Architectural Risk Analysis - the Berryville Institute’s design-level risk catalogues (the BIML-78 for generic ML, and an LLM ARA / “23 black-box risks”, IEEE Computer, Apr 2024). Its premise is useful: many ML risks are design-level and don’t require an adversary to be real.
  • MIT AI Risk Repository - a living database of 1,700+ risks classified by cause and domain; good for breadth and governance conversations.
  • AI Incident Database - real-world AI failures and harms; grounds a threat model in what has actually gone wrong.
  • AVID - the AI Vulnerability Database, cataloguing model/data/infrastructure/governance weaknesses with referenceable IDs.