Skip to content

How a model works

Before any of the security material lands, you need a clear picture of the thing being attacked. Strip away the mystique and a modern AI model is two ideas. First, a neural network: many layers of simple numeric connections whose strengths - the weights, also called parameters - are adjusted until the network maps inputs to desired outputs. A frontier model has billions of these numbers. Second, two distinct phases that people constantly conflate:

  • Training - the expensive, one-time process of feeding data through the network and nudging the weights to reduce error. The output is the model.
  • Inference - running the finished, frozen model to produce an answer for a given input. This happens on every request and changes nothing about the weights.
flowchart LR
  D[("Training data<br/>text · code · images")] --> TR["Training<br/>adjust weights to cut error"]
  TR --> M["Model = a file of weights<br/>billions of parameters"]
  P["Prompt / input"] --> INF["Inference<br/>frozen weights predict"]
  M --> INF
  INF --> O["Output"]
  classDef t fill:#1d1708,stroke:#e4a23f,color:#f0d8a8;
  classDef i fill:#0f1a18,stroke:#5bd1c5,color:#bdeee2;
  class D,TR,M t; class P,INF,O i;

Training (amber) is rare and costly; inference (teal) is constant. The security relevance is immediate: attacks that poison data attach at training, while injection and extraction attack inference - the same two-coordinate idea as the lifecycle map in I.8.