Why Your Detection Latency Budget Determines Blast Radius
Most teams buy detection on a single number. The datasheet says “millisecond detection,” the proof-of-concept...
May 23, 2026
Your platform team already runs a production-readiness review on every workload that ships to Kubernetes. When the workload is an AI agent, the PRR doesn’t get thrown out — it gets a delta. Most of the items still apply; specific ones need extension when the workload is non-deterministic, calls tools dynamically, and exercises identity at runtime in ways the manifest didn’t predict.
That delta isn’t a flat list of new items. It’s a sequenced dependency chain. The four pillars that secure AI agents in cloud environments — Observe, Posture, Detect, Enforce — each produce the runtime artifact the next one needs. Observation produces the runtime AI-BOM and behavioral data Posture consumes. Posture produces the declared-vs-observed reconciliation Detect consumes. Detect produces the confirmed behavioral patterns Enforce consumes. Work the pillars in parallel and you ship enforcement policies before there’s data to write them against — the failure mode that breaks agents in their first week of production.
This checklist sits above the cloud-primitive layer. The wiring to IRSA on EKS, Workload Identity on GKE, or Defender for Containers on AKS lives in your per-cloud implementation; the four-pillar deltas below apply regardless of which cloud you’re on.
The PRR Delta at a glance:
| Standard K8s PRR item | AI agent delta |
|---|---|
| Resource limits sized for steady-state | Sized for inference bursts and variable token throughput |
| securityContext: runAsNonRoot, readOnlyRootFilesystem | Plus: outbound capability check — the agent’s tool list is the real attack surface |
| ServiceAccount per workload | ServiceAccount per agent, never shared across agent types |
| NetworkPolicy with default-deny egress | Allowlist of tool endpoints, RAG sources, and model providers — not just internal services |
| Observability: metrics, logs, traces | Plus: LLM/MCP call instrumentation routed to the SOC, not only to LangSmith or LangFuse |
| (not in standard PRR) | Observation window — production-equivalent runtime before any enforcement |
| (not in standard PRR) | Runtime AI-BOM — what the agent actually loaded, not what the manifest declared |
Three deltas. None are optional. All run before the agent receives production traffic.
Deploy the per-Deployment eBPF sensor in observation mode. Not after the agent ships, when an incident exposes the gap — before. The sensor runs at the kernel layer with single-digit overhead (1–2.5% CPU, 1% memory) and captures syscall and network activity for every pod in the Deployment. Observation mode means no enforcement, no blocking, no alerts to triage. Just runtime data. Least-privilege isn’t a static configuration goal — it’s a runtime measurement. The only way to know what privileges the agent actually needs is to observe what it exercises, which is what this pillar produces.
Enable runtime AI-BOM generation. The manifest tells you which model the agent is supposed to load, which MCP servers it’s supposed to call, which Python packages got pinned in requirements.txt. The runtime AI-BOM tells you what it actually loaded: which model version got pulled by the runtime resolver, which tool endpoints got registered after startup, which transitive Python dependencies got installed at startup. The delta between manifest and runtime is where most agent supply chain incidents live — and where a runtime-derived AI-BOM as a deployment artifact earns its keep against static manifests.
Wire LLM and MCP call instrumentation to the SOC’s data plane. Not just to LangSmith or LangFuse. The DevTools-grade observability platforms are fine for application performance debugging; they’re not where the SOC looks during an incident. Route prompt content, tool invocations, and model responses to whatever streaming platform feeds your detection pipeline. The same stream is what we’ve previously broken down for tracing multi-agent attack chains across agent boundaries.
Artifact produced by this pillar: a runtime AI-BOM and a behavioral data stream covering the agent’s first N hours in production-equivalent mode. The next three pillars have no input without it.
Three deltas, all consuming the data from Pillar 1.
One ServiceAccount per agent, never shared across agents. Standard K8s practice has each workload binding to its own SA, but shared SAs across agents of the same “type” is a common shortcut that erases the audit trail you need when one agent goes off-policy. Per-agent SA means per-agent IAM scope, per-agent audit log, per-agent revocation path.
Capture the Application Profile DNA at the Deployment object. APD is the behavioral envelope of the agent — which syscalls it makes, which network destinations it reaches, which files it touches — captured during the observation window. The artifact lives at the Deployment level (not per-pod) so it survives pod restarts and HPA scaling. Without an APD, “what does this agent normally do” is a question with no answer, which means anomaly detection in Pillar 3 has no baseline to compare against.
Ship the declared-vs-observed reconciliation report. This is the artifact most teams gesture at and never produce. Format: a structured report attached to the deployment ticket, listing the IAM scope the agent’s ServiceAccount was granted versus the APIs it actually called during the observation window, with the delta highlighted. A common pattern at this point in the deployment: the agent’s SA was granted ~30–50 permissions during dev and uses 3–5 of them in production. Granting the remaining 30+ to production is exactly the over-privilege gap that turns a routine agent into a high-blast-radius incident.
The per-agent ServiceAccount handles the Kubernetes identity scope. For the application-layer authorization scope — what tools the agent can actually invoke — the boundary is the tool gateway, not the system prompt. A prompt instruction telling the agent “don’t call delete_customer” is not a security control; the gateway that refuses to route the call is. Both deltas — Kubernetes identity scope and application-layer authorization scope — land in this pillar.
Artifact produced by this pillar: the APD baseline plus the declared-vs-observed reconciliation. These feed the detection rules in Pillar 3 and the CISO’s 7-gate go-live approval.
Three deltas, all consuming the baselines from Pillar 2.
Configure attack story routing, not alert routing. Alerts fire on single signals: this Pod made an unexpected egress call, this SA touched an unexpected API. Attack stories correlate signals across surfaces — input layer (prompt), tool invocation, identity exercise, and multi-agent boundaries — into a coherent narrative. The SOC tier that triages your alerts doesn’t have the context to reassemble single-signal alerts into the full chain; the platform has to do it. ARMO’s CADR is the layer that handles this correlation across the four detection surfaces in a single attack story.
Confirm AI-specific signal sources per surface. Input-layer signals (jailbreak attempts, prompt injection patterns), tool-invocation signals (unexpected tool call sequences, parameter anomalies), identity signals (the SA touching APIs outside its observed envelope), and multi-agent signals (one agent’s output triggering another agent’s tool call). Each surface has its own detection content; the gap is usually that the platform has wired one or two surfaces and assumed the rest are covered by the existing CNAPP.
Install the deployment-correlation rule. Most baseline-drift alerts that fire in week two of production aren’t compromise — they’re a model version bump, a system prompt edit, or a new tool registration that the platform team pushed without telling the SOC. Wire the rule so that any APD drift correlated with a Deployment-object change is auto-tagged “expected drift, deployment correlated,” not “potential compromise.” This single rule saves the SOC the alert-fatigue tax that kills agent detection programs in their first month.
Artifact produced by this pillar: confirmed behavioral patterns that separate legitimate variation from compromise — the input the next pillar consumes.
Three deltas, all consuming the confirmed patterns from Pillar 3.
Configure the progressive enforcement workflow. Observe → Audit → Enforce. Each stage’s promotion gate is a manual decision the platform team makes based on the artifacts from the prior pillars. The methodology applies broadly to any Kubernetes workload — we’ve worked through the observe-to-audit-to-enforce promotion path for the general case elsewhere. What changes for AI agents is the length of the observation window: non-deterministic workloads need longer baselines than deterministic services.
Auto-generate NetworkPolicy from observed egress. The standard pattern is to hand-write NetworkPolicies based on the developer’s best guess of what the agent will reach. The runtime-derived pattern is the inverse: let the sensor record every egress destination during observation and audit modes, then promote that allowlist into the NetworkPolicy. Same loop for seccomp profiles, where the syscall list comes from observed activity, not from a generic template.
Promote the policy with the deployment, not after it. This is the line between observe-then-enforce and observe-then-someday-enforce. The Pillar 4 work is done as part of the same deployment cycle that produced the baselines, not as a separate retroactive hardening project. If the policies aren’t promoted in the same deployment that captured the data, they don’t get written.
Artifact produced by this pillar: per-agent NetworkPolicy and seccomp profile, both promoted from observed behavior. Fix without breaking, by construction.
The four-pillar checklist isn’t a checklist in the flat-list sense. It’s a pipeline. Each pillar’s output is the next pillar’s input. Run them out of order and the inputs aren’t there. Run them in parallel and the policies get written against data that doesn’t exist yet.
The PRR Delta isn’t a parallel universe of new work for the platform team — it’s a recognition that the workload changed, so the PRR has to follow. Most of the items in your existing PRR still apply. The deltas are specific, ordered, and each produces an artifact the next step needs.
The four-pillar checklist produces the runtime artifacts that the CISO’s 7-gate go-live approval consumes — same pipeline, different operational moment. The platform team’s checklist ends where the security team’s gate begins.
If you want to see how runtime-derived AI-BOM, Application Profile DNA, and progressive enforcement policies get produced on a Kubernetes cluster you already run, ARMO’s cloud-native security for AI workloads handles all four pillars on a single eBPF sensor with 90%+ CVE noise reduction via runtime reachability.
What is a runtime AI-BOM and why does the checklist require it?
A runtime AI-BOM is the inventory of every model, MCP server, Python dependency, and external endpoint the agent actually loaded at startup and runtime — not what the manifest or requirements.txt declared. It’s required because the gap between declared and loaded is where most agent supply chain incidents live: a transitive dependency that wasn’t reviewed, a model version resolved by the runtime instead of pinned, an MCP server registered dynamically after startup. We’ve broken down the artifact structure and how to keep it fresh as the agent evolves in our deep dive on runtime-connected AI-BOMs.
How long should the observation window be before flipping to enforcement?
A low-autonomy retrieval agent reaches behavioral stability within days; a high-autonomy multi-step agent making variable tool calls needs weeks. The autonomy classification framework in the CISO’s 7-gate approval checklist gives concrete tier-by-tier ranges. Default rule of thumb: long enough that the APD captures every operating mode the agent has, not just the happy path.
Where does per-cloud wiring (IRSA, Workload Identity, Defender for Containers) fit in this checklist?
This checklist is the cloud-agnostic layer — the deltas that apply regardless of which cloud runs your Kubernetes. The per-cloud wiring lives in our EKS, AKS, and GKE implementation guides. Use those to translate the four-pillar deliverables here into your provider’s specific primitives.
Do I throw out my existing Kubernetes PRR?
No. The PRR Delta extends your existing PRR — it doesn’t replace it. Most items in your existing review still apply; specific ones (resource limits, ServiceAccount, NetworkPolicy, observability) get extended for AI workloads; two new items (observation window, runtime AI-BOM) get added. Treat this checklist as the PRR addendum your platform team owns when the workload is an AI agent.
How does this checklist relate to the CISO go-live approval?
This is the engineering work the platform team does before the approval meeting. The four pillars produce the runtime artifacts — AI-BOM, APD baseline, declared-vs-observed reconciliation, attack story routing, progressive enforcement policies — that the CISO’s 7-gate go-live approval then evaluates. Same pipeline, different operational moment.
Most teams buy detection on a single number. The datasheet says “millisecond detection,” the proof-of-concept...
The first time a security team needs an AI agent audit trail is usually 72...
Every AI-SPM tool runs posture and detection with a single arrow: runtime evidence flowing back...