AI Agent Governance: From Policy Framework to Runtime Enforcement
Most enterprise AI agent governance programs publish policies at the bottom three rungs of a...
May 23, 2026
Most threat modeling assumes the attacker has to break something. AI agents change that assumption. An attacker who controls a prompt can make the agent misbehave without breaking anything at all. The prompt can be a customer support ticket the agent reads, a document it retrieves, or a tool response it processes — any input the agent treats as context is an attack surface.
On Kubernetes, that attack surface has physical form. Pods, ServiceAccounts, MCP tool servers, and per-Deployment behavioral baselines are where it lives, and where the catalog you wrote at approval time stops covering you the moment an engineer adds a connection.
This article walks through a four-step methodology that produces a threat model designed for that reality: Decompose the agent into the components your Kubernetes cluster produces telemetry for. Enumerate reachable capability beyond declared permission. Classify each threat as coercion or compromise. Close every threat against a runtime evidence signal. The catalog you build this way is a living document that converges on what your cluster is actually doing — not a one-time exercise.
Start by decomposing the agent into the components your Kubernetes cluster produces telemetry for. The goal of this step is to get off the whiteboard and onto the cluster — an inventory that maps to the specific objects, signal sources, and identities your security stack can actually observe.
Five component categories cover the analytical surface:
For each component, identify three things. The trust source — typically a Deployment manifest, ConfigMap, or Helm values file. The observable signal source — kube-audit events for API actions, eBPF for syscalls and process lineage, framework SDK callbacks for tool invocations, IAM event streams for identity exercise, MCP server logs for protocol-level traffic. And the blast surface — what this component can reach in the cluster if its decision logic is coerced.
| Component | Trust Source | Telemetry Source |
|---|---|---|
| Inference server pod or sidecar | Deployment manifest, model registry reference | eBPF syscalls, container runtime events, kube-audit |
| MCP tool servers and framework SDK | Tool registration, framework configuration | MCP server logs, SDK callbacks, in-process instrumentation |
| RAG store, vector database, memory layer | Data source configuration, ingestion pipeline | Database audit logs, vector store query logs |
| ServiceAccount, KSA, IRSA identity binding | RoleBinding, ClusterRoleBinding, IAM policy | kube-audit events, IAM event streams |
| Runtime substrate (kubelet, CNI, IMDS, API server) | Cluster configuration, NetworkPolicy, admission rules | kube-audit, admission webhook events, CNI logs |
The output of Step 1 is a component inventory keyed to actual cluster telemetry. Declared component lists go stale by Thursday when an engineer adds an MCP connection; a runtime-derived AI-BOM keeps the inventory live against what actually loaded into the agent’s address space. Without this binding, the rest of the framework runs against a fiction — the decomposition you wrote in Confluence, not the agent your cluster is running.
Scope boundary: model-artifact threats — training data poisoning, weight tampering, supply chain attacks on the model file itself — sit at the artifact layer rather than the Kubernetes runtime layer. The decomposition above intentionally stops where the model lands in the cluster. Treat the artifact layer as a separate workstream covered by model registry controls, signed manifests, and supply chain attestation.
The most consequential threat surface for an AI agent is the gap between what its manifest says it can do and what it can actually reach when an attacker controls the prompt.
Reachable capability has three terms multiplied together: declared permissions (what the IAM grant lists), prompt-reachable code paths (which of those grants the agent’s intent layer can be steered toward), and tool-available actions (what the MCP tool surface and framework SDK actually expose). An agent declared with 47 API permissions may exercise three in normal operation. The remaining 44 are dormant. Every dormant permission is reachable under prompt injection until proven otherwise — and proving otherwise requires runtime evidence, not a developer’s assurance that “we just added them for future-proofing.”
The procedure is mechanical. List the agent’s declared scope from manifests, role bindings, and tool registrations. Run a two-to-four-week observation window in production, capturing the actual API calls, syscalls, and tool invocations the agent makes. Compute the delta: declared minus observed equals dormant. For each dormant permission, score its blast surface — what data, system, or downstream agent does this permission reach if it fires? That blast score determines whether the permission gets revoked, justified in writing, or accepted as known exposure with a dated re-review.
The same runtime reachability technique that reduces CVE noise by surfacing only vulnerabilities in code paths that actually execute applies here, to permissions instead of packages. The output of Step 2 is the catalog’s “what could happen” column populated with reachable actions, not aspirational ones. Engineering’s “future-proofing” argument fails this gate by definition — future-proofed permissions are dormant attack surface with no closing condition.
Every threat in the catalog falls into one of two structural classes. Compromise means an attacker exploits a vulnerability in the agent — a known CVE in the inference server, a misconfigured token mount, an exposed kubelet API. The attacker breaks something to make the agent misbehave. Coercion means an attacker uses the agent’s legitimate, authorized capability for an attacker-supplied intent, typically by injecting instruction into content the agent treats as context. Nothing is exploited. The agent is doing exactly what it was authorized to do, redirected.
The distinction matters because the evidence diverges. Compromise produces syscall anomalies, unexpected process trees, unauthorized binary execution, and container artifacts. Coercion produces behavioral-envelope drift — the agent calls APIs it had access to but never called, in volumes it hadn’t seen, against targets that pass authorization but fail context. A detection pipeline tuned for compromise misses coercion entirely.
The canonical coercion case is the confused deputy at the MCP boundary. When an AI agent invokes an MCP tool, the tool executes with the agent’s credentials, not the calling user’s. Any user who reaches the agent’s prompt can, in principle, reach anything the agent is authorized to reach — because a successful indirect injection turns the user’s request into the agent’s full-privilege request. Blast radius is computed against the agent’s full privilege set, not the user’s. This is not a vulnerability; it is the design of the MCP protocol working exactly as specified.
| Dimension | Compromise | Coercion |
|---|---|---|
| Attacker action | Exploits vulnerability path | Supplies instruction the agent treats as context |
| Authorization state | Bypassed or escalated | Honored — agent is doing what it was authorized to do |
| Example | Container escape via known CVE, kubelet API exploitation | Prompt-injected SQL via authorized database access |
| Primary evidence | Syscall anomalies, process-tree deviations, unauthorized binaries | Behavioral envelope drift, atypical tool-call sequence and rate |
| Detection layer | eBPF kernel signals, kube-audit, container runtime | Per-Deployment behavioral baseline, framework SDK callbacks |
| Mitigation class | Patch, configuration hardening, image scanning | Scope reduction, behavioral enforcement, context-aware policy |
Compromise-class examples include container escape via a known CVE, ServiceAccount token theft via mounted secret, kubelet API exploitation, and supply chain compromise of a base image. Coercion-class examples include prompt-injected SQL via an agent with authorized database access, RAG poisoning that drives the agent to read sensitive records it has authorized access to, MCP tool chains that exit the boundary your BAA defines, and agent-to-agent delegation hijack where a low-privilege agent’s compromised output becomes a high-privilege agent’s input.
The output of Step 3 is a classified catalog with a class tag and an evidence-signal hint per entry. The class tag drives which Kubernetes signal you instrument for in Step 4. Compromise threats wire to syscall and audit signals; coercion threats wire to per-agent behavioral baselines, and CADR’s full-chain attack story correlation ties them back into a single incident when an attack moves between classes.
A threat catalog without a runtime evidence signal per entry is a list of fears, not a security control. Step 4 closes every entry against an observable signal or marks it as an open exposure.
The convergence criterion is binary per entry: either the catalog names a Kubernetes signal source that would fire on the threat — a specific kube-audit event, an eBPF syscall pattern, a behavioral envelope deviation, a framework SDK callback — or the threat is open. Open threats need either a new instrumentation pass or an explicit accept-the-risk record with a dated re-review. There is no third state.
Closure happens in two phases. Day 1 produces the catalog as a working hypothesis — every threat enumerated from Steps 1 through 3, with proposed evidence signals based on the component decomposition and class tag. Day 30 produces the converged catalog — every proposed evidence signal validated against actual runtime data, gaps surfaced where the proposed signal didn’t fire when expected, and new threats added where behavior the catalog didn’t anticipate appeared in the observation window. Application Profile DNA at the Deployment level is what makes this convergence possible. Per-pod baselines never converge for ephemeral AI workloads because pod lifetime is shorter than the observation window. Deployment-level profiles absorb churn while still producing the per-agent behavioral envelope every coercion-class threat needs evidence against.
The catalog re-opens at five defined events: a new MCP connection added, a model version updated, a tool surface modified, behavioral baseline drift beyond the defined envelope, or an autonomy tier change for the agent. Each event invalidates a specific subset of the catalog’s evidence signals — a new MCP connection invalidates the tool-call envelope, a model update invalidates the prompt-response distribution. The re-open isn’t a full re-do; it’s a scoped reconvergence against the specific surface that changed.
The output is an instrumented threat model — every entry tied to a Kubernetes signal source, every closure event logged, every re-open triggered by a defined operational event. The catalog is software now, not a Confluence page.
The output of these four steps maps directly into the Observe → Posture → Detect → Enforce implementation methodology covered in the parent framework guide.
The Step 1 component inventory feeds Observe — every component identified is something to discover, instrument, and emit telemetry from. The Step 2 reachability gap feeds Posture — the dormant permission set is exactly what declared-versus-observed reconciliation surfaces. The Step 3 compromise column feeds the syscall and audit signal side of Detect; the Step 3 coercion column feeds the behavioral envelope side. The Step 4 closure feeds Enforce — per-agent policy generation from observed baseline rather than guesswork from a manifest.
Without the catalog the four steps produce, the four-pillar implementation runs against a fiction. Observe collects everything, Posture compares declared-to-declared, Detect chases generic container alerts, Enforce stalls. With the catalog, every pillar has a defined input.
AI agent threat modeling for Kubernetes is structurally different from threat modeling anything else you have put in production, and the difference is that the threat surface is runtime-defined. The agent’s tool surface evolves. Its prompt-reachable paths shift with every model update. Its dormant permissions wait silently for the right input. A catalog built once at approval time cannot track any of this, and a catalog that does not track it stops protecting the agent the day after it ships.
The four steps above produce the kind of catalog that does. Decomposition into Kubernetes-runtime primitives binds every entry to a telemetry source the cluster actually emits. Reachable-capability enumeration replaces declared scope with the cross-product of what the agent could be talked into doing. The coercion-versus-compromise classification gives each entry the structural tag that determines which evidence signal will detect it. And runtime closure converts the document from a one-time exercise into a living catalog that re-opens at every operational event that changes the threat surface.
Walk these four steps against the highest-autonomy agent currently running in your cluster. Book a demo to see how cloud-native security for AI workloads closes the loop in production.
Skip nothing. Run Step 1 against the existing deployment — you will surface shadow components engineering didn’t catalog and tool surfaces that have grown since approval. Start the Step 2 observation window now; expect two to four weeks before behavioral closure. Run Steps 3 and 4 continuously thereafter, with re-opens triggered at every operational event the catalog defines.
Inventory and reachability work is immediate — days of effort, scoped per agent. Behavioral closure runs two to four weeks per Deployment for routine workloads, longer for agents with rare-but-legitimate work patterns like monthly reports or quarterly batch processing. Per-pod baselines never converge because pod lifetime is shorter than the convergence period; baselines must be maintained at the Deployment level.
Security owns the catalog. Platform engineering owns the closure signal feed — the kube-audit subscription, the eBPF sensor deployment, the framework SDK telemetry pipeline. Both own the re-open gate at every MCP connection, tool addition, or model update. A catalog owned by one team without the other is a catalog that stops closing the first time engineering ships faster than security reviews.
Step 1 decomposition includes orchestrator and delegation edges as first-class components. Step 4 closure uses framework SDK telemetry — LangGraph state transitions, CrewAI delegation events, AutoGen speaker selections, MCP protocol messages. Single-agent baselines structurally cannot see cross-agent threats; the contagion lives in the spaces between agents, where one agent’s compromised output becomes another agent’s input.
Most enterprise AI agent governance programs publish policies at the bottom three rungs of a...
A CNAPP isn’t a single instrument. It bundles five separately-instrumented security domains — CSPM, CWPP,...
Your platform team already runs a production-readiness review on every workload that ships to Kubernetes....