Orientation & how to use this playbook

Read it as a path. Each part builds on the one before: foundations frame the problem, attacks-on-models give you the primitives, the agentic stack shows how those primitives compose into real systems, the frontier stage is where capability becomes the threat, and the final stage turns all of it into defense and advice. Threat cards expand, self-checks expand, comparisons are tabbed. Use the index as a lookup once you’ve been through once.

Hold one architecture in your head, because nearly every vulnerability here is a trust-boundary error - data from one zone treated as instructions in another. The agentic stack is three layers: the model API (the reasoning endpoint that can call functions), MCP (the agent’s vertical reach into tools and data), and A2A (horizontal collaboration between agents).

flowchart TB
  U["Human or calling application"]
  subgraph BRAIN["REASONING LAYER · <a class="xref" href="#apis">II.5</a>"]
    API["AI Model API<br/>tool-use / function-calling loop"]
  end
  subgraph VERT["TOOL & CONTEXT LAYER · MCP · <a class="xref" href="#mcp">II.6</a>"]
    MC["MCP Client"]
    MS["MCP Servers"]
  end
  subgraph HORIZ["INTER-AGENT LAYER · A2A · <a class="xref" href="#a2a">II.7</a>"]
    RA["Remote agents via Agent Cards"]
  end
  U --> API
  API -->|"discovers + invokes tools"| MC
  MC --> MS
  MS --> DATA[("Files · DBs · SaaS · OT · Cloud")]
  API -->|"delegates whole tasks"| RA
  RA -->|"results re-enter context"| API
  classDef brain fill:#1d1708,stroke:#e4a23f,color:#f0d8a8;
  classDef vert fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
  classDef horiz fill:#11161f,stroke:#8fb9ff,color:#c6d4ef;
  class API brain; class MC,MS vert; class RA horiz;

Each downward arrow is also an upward channel for untrusted content: a tool result, a fetched page, an Agent Card, or a peer’s reply all arrive as text the model may treat as a command. That is the root of the entire landscape.

At a glance - the three protocol layers

AI Model API - reasoning endpoint

MECHANISM tool_use / function-calling loop
SHAPE HTTPS / JSON, often streamed
PRIMARY RISK prompt injection, key leakage, cost/DoS, excessive agency
GOVERNED BY OWASP Top 10 for LLM Apps (2025)

MCP - vertical reach into tools

ROLES host (app) · client (connector) · server (exposes tools; a role, not a host)
ORIGIN Anthropic Nov 2024 · Linux Foundation
SHAPE JSON-RPC 2.0 over stdio / Streamable HTTP
AUTH OAuth 2.1 Resource Server (spec 2025-11-25)
PRIMARY RISK tool poisoning, rug pulls, confused deputy, RCE

A2A - horizontal collaboration

ORIGIN Google Apr 2025 · Linux Foundation
DISCOVERY Agent Cards (/.well-known/agent-card.json)
STANCE opaque execution - share context, not internals
PRIMARY RISK card spoofing, impersonation, task tampering, cross-vendor trust

What the stack actually looks like

The tabs above are the summary. Here is the concrete shape of each layer, so the attacks later read as tampering with something you can already picture. Everything in this subsection is normal, benign mechanics - the offensive treatment lives in Part II (II.5 through II.7, II.13).

1. The model API and function calling

A “tool” is just a function you describe to the model in JSON. The model never runs it: it emits a request to call it, your code runs the function, and you feed the result back. One round trip of the loop:

1. You call the model, passing the tools it is allowed to use:
   POST /v1/messages
   tools:    [ { "name": "get_weather",
                 "description": "Get current weather for a city.",
                 "input_schema": { "type": "object",
                                   "properties": { "city": {"type":"string"} },
                                   "required": ["city"] } } ]
   messages: [ { "role":"user", "content":"What is the weather in Singapore?" } ]

2. The model does NOT answer. It asks to call the tool:
   "stop_reason": "tool_use"
   "content": [ { "type":"tool_use", "id":"tu_01",
                  "name":"get_weather", "input": {"city":"Singapore"} } ]

3. YOUR code runs get_weather("Singapore"), then returns the result:
   messages: [ ...as before...,
               { "role":"user", "content":[ { "type":"tool_result",
                 "tool_use_id":"tu_01", "content":"31C, thunderstorms" } ] } ]

4. Now the model replies in words: "It is 31C and stormy in Singapore."
# the model only ever PROPOSES a call. your code decides whether to run it.
# "excessive agency" is giving it tools or privileges it should not have here.

2. An MCP server

MCP standardizes that same idea so any client (Claude Code, an IDE, a chat app) can use any tool provider without bespoke glue. You write a function and annotate it; the framework turns it into an advertised tool. This is the entire server:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("weather-tools")

@mcp.tool()
def get_weather(city: str) -> str:
    """Get current weather for a city."""   # this docstring becomes the tool DESCRIPTION the model reads
    return lookup(city)

mcp.run()   # stdio by default (local process); or Streamable HTTP for a networked server
# the signature (city: str) becomes the input SCHEMA, generated automatically

When a client connects, it asks the server what it offers and then calls one. That exchange is plain JSON-RPC:

# client connects and asks: what tools do you have?   method: tools/list
{ "tools": [
    { "name": "get_weather",
      "description": "Get current weather for a city.",
      "inputSchema": { "type":"object",
                       "properties": { "city": {"type":"string"} },
                       "required": ["city"] } } ] }

# the model decides to use it; the client sends   method: tools/call
{ "name": "get_weather", "arguments": { "city": "Singapore" } }

# the server runs the function and returns content the model reads as context
{ "content": [ { "type":"text", "text":"31C, thunderstorms" } ] }

3. The agent loop

An “agent” is not a special kind of model. It is the loop wrapped around the API: the model proposes a tool call, the surrounding program runs it, the result re-enters the context, and it repeats until the model stops asking for tools.

context = [ system_prompt, user_task ]
while True:
    reply = model(context, tools=available_tools)
    if reply.wants_tool:
        result   = run_tool(reply.tool_name, reply.tool_args)   # your code, your privileges
        context += [ reply, result ]      # the result re-enters the SAME context
        continue
    return reply.text                     # no tool wanted, so the task is done
# the model is the brain; the loop is the agency.
# every result appended is also a place untrusted text can enter (II.8).

4. An A2A agent card

Where MCP gives an agent tools, A2A lets one agent hand a whole task to another agent, possibly at a different company. Agents find each other by reading a published card:

GET https://partner.example/.well-known/agent-card.json

{ "name": "Invoice Processor",
  "description": "Extracts and validates invoice data.",
  "url": "https://partner.example/a2a",
  "version": "1.2.0",
  "capabilities": { "streaming": true },
  "skills": [
    { "id": "extract-invoice",
      "description": "Parse an invoice PDF into structured fields." } ] }
# another agent reads this card to discover the partner, then delegates a task to its url.
# trusting a card you did not verify is where impersonation and task tampering start (II.7).

5. Retrieval (RAG)

RAG is how an agent answers from your documents without retraining: turn the question into a vector, find the closest chunks in a vector database, and paste them into the context before the model answers.

user asks:  "What is our refund window?"

1. embed the question                  -> a query vector
2. similarity search in the vector DB  -> top-k closest chunks:
   [ "Refunds are accepted within 30 days...", "Returns must include a receipt..." ]
3. build the prompt:   system_prompt + RETRIEVED CHUNKS + the question
4. the model answers from the chunks:  "Your refund window is 30 days."
# the retrieved text lands in the SAME context as instructions,
# so a poisoned document is an injection vector, and the vector DB is an asset to protect (II.13).