Orientation & how to use this playbook
Read it as a path. Each part builds on the one before: foundations frame the problem, attacks-on-models give you the primitives, the agentic stack shows how those primitives compose into real systems, the frontier stage is where capability becomes the threat, and the final stage turns all of it into defense and advice. Threat cards expand, self-checks expand, comparisons are tabbed. Use the index as a lookup once you’ve been through once.
Hold one architecture in your head, because nearly every vulnerability here is a trust-boundary error - data from one zone treated as instructions in another. The agentic stack is three layers: the model API (the reasoning endpoint that can call functions), MCP (the agent’s vertical reach into tools and data), and A2A (horizontal collaboration between agents).
flowchart TB
U["Human or calling application"]
subgraph BRAIN["REASONING LAYER · <a class="xref" href="#apis">II.5</a>"]
API["AI Model API<br/>tool-use / function-calling loop"]
end
subgraph VERT["TOOL & CONTEXT LAYER · MCP · <a class="xref" href="#mcp">II.6</a>"]
MC["MCP Client"]
MS["MCP Servers"]
end
subgraph HORIZ["INTER-AGENT LAYER · A2A · <a class="xref" href="#a2a">II.7</a>"]
RA["Remote agents via Agent Cards"]
end
U --> API
API -->|"discovers + invokes tools"| MC
MC --> MS
MS --> DATA[("Files · DBs · SaaS · OT · Cloud")]
API -->|"delegates whole tasks"| RA
RA -->|"results re-enter context"| API
classDef brain fill:#1d1708,stroke:#e4a23f,color:#f0d8a8;
classDef vert fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
classDef horiz fill:#11161f,stroke:#8fb9ff,color:#c6d4ef;
class API brain; class MC,MS vert; class RA horiz;
Each downward arrow is also an upward channel for untrusted content: a tool result, a fetched page, an Agent Card, or a peer’s reply all arrive as text the model may treat as a command. That is the root of the entire landscape.
At a glance - the three protocol layers
AI Model API - reasoning endpoint
- MECHANISM tool_use / function-calling loop
- SHAPE HTTPS / JSON, often streamed
- PRIMARY RISK prompt injection, key leakage, cost/DoS, excessive agency
- GOVERNED BY OWASP Top 10 for LLM Apps (2025)
MCP - vertical reach into tools
- ROLES host (app) · client (connector) · server (exposes tools; a role, not a host)
- ORIGIN Anthropic Nov 2024 · Linux Foundation
- SHAPE JSON-RPC 2.0 over stdio / Streamable HTTP
- AUTH OAuth 2.1 Resource Server (spec 2025-11-25)
- PRIMARY RISK tool poisoning, rug pulls, confused deputy, RCE
A2A - horizontal collaboration
- ORIGIN Google Apr 2025 · Linux Foundation
- DISCOVERY Agent Cards (/.well-known/agent-card.json)
- STANCE opaque execution - share context, not internals
- PRIMARY RISK card spoofing, impersonation, task tampering, cross-vendor trust
What the stack actually looks like
The tabs above are the summary. Here is the concrete shape of each layer, so the attacks later read as tampering with something you can already picture. Everything in this subsection is normal, benign mechanics - the offensive treatment lives in Part II (II.5 through II.7, II.13).
1. The model API and function calling
A “tool” is just a function you describe to the model in JSON. The model never runs it: it emits a request to call it, your code runs the function, and you feed the result back. One round trip of the loop:
1. You call the model, passing the tools it is allowed to use: POST /v1/messages tools: [ { "name": "get_weather", "description": "Get current weather for a city.", "input_schema": { "type": "object", "properties": { "city": {"type":"string"} }, "required": ["city"] } } ] messages: [ { "role":"user", "content":"What is the weather in Singapore?" } ]
2. The model does NOT answer. It asks to call the tool: "stop_reason": "tool_use" "content": [ { "type":"tool_use", "id":"tu_01", "name":"get_weather", "input": {"city":"Singapore"} } ]
3. YOUR code runs get_weather("Singapore"), then returns the result: messages: [ ...as before..., { "role":"user", "content":[ { "type":"tool_result", "tool_use_id":"tu_01", "content":"31C, thunderstorms" } ] } ]
4. Now the model replies in words: "It is 31C and stormy in Singapore."# the model only ever PROPOSES a call. your code decides whether to run it.# "excessive agency" is giving it tools or privileges it should not have here.2. An MCP server
MCP standardizes that same idea so any client (Claude Code, an IDE, a chat app) can use any tool provider without bespoke glue. You write a function and annotate it; the framework turns it into an advertised tool. This is the entire server:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("weather-tools")
@mcp.tool()def get_weather(city: str) -> str: """Get current weather for a city.""" # this docstring becomes the tool DESCRIPTION the model reads return lookup(city)
mcp.run() # stdio by default (local process); or Streamable HTTP for a networked server# the signature (city: str) becomes the input SCHEMA, generated automaticallyWhen a client connects, it asks the server what it offers and then calls one. That exchange is plain JSON-RPC:
# client connects and asks: what tools do you have? method: tools/list{ "tools": [ { "name": "get_weather", "description": "Get current weather for a city.", "inputSchema": { "type":"object", "properties": { "city": {"type":"string"} }, "required": ["city"] } } ] }
# the model decides to use it; the client sends method: tools/call{ "name": "get_weather", "arguments": { "city": "Singapore" } }
# the server runs the function and returns content the model reads as context{ "content": [ { "type":"text", "text":"31C, thunderstorms" } ] }3. The agent loop
An “agent” is not a special kind of model. It is the loop wrapped around the API: the model proposes a tool call, the surrounding program runs it, the result re-enters the context, and it repeats until the model stops asking for tools.
context = [ system_prompt, user_task ]while True: reply = model(context, tools=available_tools) if reply.wants_tool: result = run_tool(reply.tool_name, reply.tool_args) # your code, your privileges context += [ reply, result ] # the result re-enters the SAME context continue return reply.text # no tool wanted, so the task is done# the model is the brain; the loop is the agency.# every result appended is also a place untrusted text can enter (II.8).4. An A2A agent card
Where MCP gives an agent tools, A2A lets one agent hand a whole task to another agent, possibly at a different company. Agents find each other by reading a published card:
GET https://partner.example/.well-known/agent-card.json
{ "name": "Invoice Processor", "description": "Extracts and validates invoice data.", "url": "https://partner.example/a2a", "version": "1.2.0", "capabilities": { "streaming": true }, "skills": [ { "id": "extract-invoice", "description": "Parse an invoice PDF into structured fields." } ] }# another agent reads this card to discover the partner, then delegates a task to its url.# trusting a card you did not verify is where impersonation and task tampering start (II.7).5. Retrieval (RAG)
RAG is how an agent answers from your documents without retraining: turn the question into a vector, find the closest chunks in a vector database, and paste them into the context before the model answers.
user asks: "What is our refund window?"
1. embed the question -> a query vector2. similarity search in the vector DB -> top-k closest chunks: [ "Refunds are accepted within 30 days...", "Returns must include a receipt..." ]3. build the prompt: system_prompt + RETRIEVED CHUNKS + the question4. the model answers from the chunks: "Your refund window is 30 days."# the retrieved text lands in the SAME context as instructions,# so a poisoned document is an injection vector, and the vector DB is an asset to protect (II.13).