Get the latest, first
arrowBlog
How to Threat Model AI Agents in Kubernetes: A Practical Framework

How to Threat Model AI Agents in Kubernetes: A Practical Framework

May 23, 2026

Yossi Ben Naim
VP of Product Management

Key takeaways

  • Why does threat modeling for AI agents need to be different from threat modeling for other Kubernetes workloads? Because the attacker doesn't need to break anything. An AI agent can be made to misbehave by an attacker who controls a prompt — a customer support ticket, a retrieved document, a tool response — without exploiting a single vulnerability. The agent runs its authorized code against its authorized targets, just doing the attacker's bidding while it does so. Traditional threat modeling has no category for this.
  • What does a practical threat modeling framework for AI agents on Kubernetes actually look like? Four steps. Decompose the agent into the components your Kubernetes cluster produces telemetry for. Enumerate reachable capability instead of declared permission — an agent declared with 47 API permissions and exercising three has 44 dormant ones reachable under prompt injection. Classify each threat as coercion or compromise. Close every entry against a specific runtime signal that would fire if the threat materialized.
  • What changes about how the threat model gets maintained? The catalog stops being a document and starts being a living artifact. Day 1 produces a working hypothesis; Day 30 produces a converged catalog after two to four weeks of per-Deployment behavioral observation. After that, the catalog re-opens at every operational event that shifts the threat surface — a new MCP connection, a model update, a tool change. A threat model that doesn't re-converge is itself an exposure.

Most threat modeling assumes the attacker has to break something. AI agents change that assumption. An attacker who controls a prompt can make the agent misbehave without breaking anything at all. The prompt can be a customer support ticket the agent reads, a document it retrieves, or a tool response it processes — any input the agent treats as context is an attack surface.

On Kubernetes, that attack surface has physical form. Pods, ServiceAccounts, MCP tool servers, and per-Deployment behavioral baselines are where it lives, and where the catalog you wrote at approval time stops covering you the moment an engineer adds a connection.

This article walks through a four-step methodology that produces a threat model designed for that reality: Decompose the agent into the components your Kubernetes cluster produces telemetry for. Enumerate reachable capability beyond declared permission. Classify each threat as coercion or compromise. Close every threat against a runtime evidence signal. The catalog you build this way is a living document that converges on what your cluster is actually doing — not a one-time exercise.

Step 1: Decompose Into Kubernetes-Runtime Primitives

Start by decomposing the agent into the components your Kubernetes cluster produces telemetry for. The goal of this step is to get off the whiteboard and onto the cluster — an inventory that maps to the specific objects, signal sources, and identities your security stack can actually observe.

Five component categories cover the analytical surface:

  • The inference server pod or sidecar hosting the LLM call
  • The MCP tool servers and framework SDK routing tool invocations
  • The RAG store, vector database, and memory layer the agent reads from
  • The ServiceAccount, KSA, and IRSA binding that provides agent identity
  • The runtime substrate of kubelet, CNI, IMDS, and the cluster API server

For each component, identify three things. The trust source — typically a Deployment manifest, ConfigMap, or Helm values file. The observable signal source — kube-audit events for API actions, eBPF for syscalls and process lineage, framework SDK callbacks for tool invocations, IAM event streams for identity exercise, MCP server logs for protocol-level traffic. And the blast surface — what this component can reach in the cluster if its decision logic is coerced.

ComponentTrust SourceTelemetry Source
Inference server pod or sidecarDeployment manifest, model registry referenceeBPF syscalls, container runtime events, kube-audit
MCP tool servers and framework SDKTool registration, framework configurationMCP server logs, SDK callbacks, in-process instrumentation
RAG store, vector database, memory layerData source configuration, ingestion pipelineDatabase audit logs, vector store query logs
ServiceAccount, KSA, IRSA identity bindingRoleBinding, ClusterRoleBinding, IAM policykube-audit events, IAM event streams
Runtime substrate (kubelet, CNI, IMDS, API server)Cluster configuration, NetworkPolicy, admission ruleskube-audit, admission webhook events, CNI logs

The output of Step 1 is a component inventory keyed to actual cluster telemetry. Declared component lists go stale by Thursday when an engineer adds an MCP connection; a runtime-derived AI-BOM keeps the inventory live against what actually loaded into the agent’s address space. Without this binding, the rest of the framework runs against a fiction — the decomposition you wrote in Confluence, not the agent your cluster is running.

Scope boundary: model-artifact threats — training data poisoning, weight tampering, supply chain attacks on the model file itself — sit at the artifact layer rather than the Kubernetes runtime layer. The decomposition above intentionally stops where the model lands in the cluster. Treat the artifact layer as a separate workstream covered by model registry controls, signed manifests, and supply chain attestation.

Step 2: Enumerate Reachable Capability, Not Declared Permission

The most consequential threat surface for an AI agent is the gap between what its manifest says it can do and what it can actually reach when an attacker controls the prompt.

Reachable capability has three terms multiplied together: declared permissions (what the IAM grant lists), prompt-reachable code paths (which of those grants the agent’s intent layer can be steered toward), and tool-available actions (what the MCP tool surface and framework SDK actually expose). An agent declared with 47 API permissions may exercise three in normal operation. The remaining 44 are dormant. Every dormant permission is reachable under prompt injection until proven otherwise — and proving otherwise requires runtime evidence, not a developer’s assurance that “we just added them for future-proofing.”

The procedure is mechanical. List the agent’s declared scope from manifests, role bindings, and tool registrations. Run a two-to-four-week observation window in production, capturing the actual API calls, syscalls, and tool invocations the agent makes. Compute the delta: declared minus observed equals dormant. For each dormant permission, score its blast surface — what data, system, or downstream agent does this permission reach if it fires? That blast score determines whether the permission gets revoked, justified in writing, or accepted as known exposure with a dated re-review.

The same runtime reachability technique that reduces CVE noise by surfacing only vulnerabilities in code paths that actually execute applies here, to permissions instead of packages. The output of Step 2 is the catalog’s “what could happen” column populated with reachable actions, not aspirational ones. Engineering’s “future-proofing” argument fails this gate by definition — future-proofed permissions are dormant attack surface with no closing condition.

Step 3: Classify Each Threat: Coercion or Compromise

Every threat in the catalog falls into one of two structural classes. Compromise means an attacker exploits a vulnerability in the agent — a known CVE in the inference server, a misconfigured token mount, an exposed kubelet API. The attacker breaks something to make the agent misbehave. Coercion means an attacker uses the agent’s legitimate, authorized capability for an attacker-supplied intent, typically by injecting instruction into content the agent treats as context. Nothing is exploited. The agent is doing exactly what it was authorized to do, redirected.

The distinction matters because the evidence diverges. Compromise produces syscall anomalies, unexpected process trees, unauthorized binary execution, and container artifacts. Coercion produces behavioral-envelope drift — the agent calls APIs it had access to but never called, in volumes it hadn’t seen, against targets that pass authorization but fail context. A detection pipeline tuned for compromise misses coercion entirely.

The canonical coercion case is the confused deputy at the MCP boundary. When an AI agent invokes an MCP tool, the tool executes with the agent’s credentials, not the calling user’s. Any user who reaches the agent’s prompt can, in principle, reach anything the agent is authorized to reach — because a successful indirect injection turns the user’s request into the agent’s full-privilege request. Blast radius is computed against the agent’s full privilege set, not the user’s. This is not a vulnerability; it is the design of the MCP protocol working exactly as specified.

DimensionCompromiseCoercion
Attacker actionExploits vulnerability pathSupplies instruction the agent treats as context
Authorization stateBypassed or escalatedHonored — agent is doing what it was authorized to do
ExampleContainer escape via known CVE, kubelet API exploitationPrompt-injected SQL via authorized database access
Primary evidenceSyscall anomalies, process-tree deviations, unauthorized binariesBehavioral envelope drift, atypical tool-call sequence and rate
Detection layereBPF kernel signals, kube-audit, container runtimePer-Deployment behavioral baseline, framework SDK callbacks
Mitigation classPatch, configuration hardening, image scanningScope reduction, behavioral enforcement, context-aware policy

Compromise-class examples include container escape via a known CVE, ServiceAccount token theft via mounted secret, kubelet API exploitation, and supply chain compromise of a base image. Coercion-class examples include prompt-injected SQL via an agent with authorized database access, RAG poisoning that drives the agent to read sensitive records it has authorized access to, MCP tool chains that exit the boundary your BAA defines, and agent-to-agent delegation hijack where a low-privilege agent’s compromised output becomes a high-privilege agent’s input.

The output of Step 3 is a classified catalog with a class tag and an evidence-signal hint per entry. The class tag drives which Kubernetes signal you instrument for in Step 4. Compromise threats wire to syscall and audit signals; coercion threats wire to per-agent behavioral baselines, and CADR’s full-chain attack story correlation ties them back into a single incident when an attack moves between classes.

Step 4: Close Each Threat Against Runtime Evidence

A threat catalog without a runtime evidence signal per entry is a list of fears, not a security control. Step 4 closes every entry against an observable signal or marks it as an open exposure.

The convergence criterion is binary per entry: either the catalog names a Kubernetes signal source that would fire on the threat — a specific kube-audit event, an eBPF syscall pattern, a behavioral envelope deviation, a framework SDK callback — or the threat is open. Open threats need either a new instrumentation pass or an explicit accept-the-risk record with a dated re-review. There is no third state.

Closure happens in two phases. Day 1 produces the catalog as a working hypothesis — every threat enumerated from Steps 1 through 3, with proposed evidence signals based on the component decomposition and class tag. Day 30 produces the converged catalog — every proposed evidence signal validated against actual runtime data, gaps surfaced where the proposed signal didn’t fire when expected, and new threats added where behavior the catalog didn’t anticipate appeared in the observation window. Application Profile DNA at the Deployment level is what makes this convergence possible. Per-pod baselines never converge for ephemeral AI workloads because pod lifetime is shorter than the observation window. Deployment-level profiles absorb churn while still producing the per-agent behavioral envelope every coercion-class threat needs evidence against.

The catalog re-opens at five defined events: a new MCP connection added, a model version updated, a tool surface modified, behavioral baseline drift beyond the defined envelope, or an autonomy tier change for the agent. Each event invalidates a specific subset of the catalog’s evidence signals — a new MCP connection invalidates the tool-call envelope, a model update invalidates the prompt-response distribution. The re-open isn’t a full re-do; it’s a scoped reconvergence against the specific surface that changed.

The output is an instrumented threat model — every entry tied to a Kubernetes signal source, every closure event logged, every re-open triggered by a defined operational event. The catalog is software now, not a Confluence page.

Hand the Catalog Off to the Four-Pillar Framework

The output of these four steps maps directly into the Observe → Posture → Detect → Enforce implementation methodology covered in the parent framework guide.

The Step 1 component inventory feeds Observe — every component identified is something to discover, instrument, and emit telemetry from. The Step 2 reachability gap feeds Posture — the dormant permission set is exactly what declared-versus-observed reconciliation surfaces. The Step 3 compromise column feeds the syscall and audit signal side of Detect; the Step 3 coercion column feeds the behavioral envelope side. The Step 4 closure feeds Enforce — per-agent policy generation from observed baseline rather than guesswork from a manifest.

Without the catalog the four steps produce, the four-pillar implementation runs against a fiction. Observe collects everything, Posture compares declared-to-declared, Detect chases generic container alerts, Enforce stalls. With the catalog, every pillar has a defined input.

Threat Modeling AI Agents Is a Runtime Discipline 

AI agent threat modeling for Kubernetes is structurally different from threat modeling anything else you have put in production, and the difference is that the threat surface is runtime-defined. The agent’s tool surface evolves. Its prompt-reachable paths shift with every model update. Its dormant permissions wait silently for the right input. A catalog built once at approval time cannot track any of this, and a catalog that does not track it stops protecting the agent the day after it ships.

The four steps above produce the kind of catalog that does. Decomposition into Kubernetes-runtime primitives binds every entry to a telemetry source the cluster actually emits. Reachable-capability enumeration replaces declared scope with the cross-product of what the agent could be talked into doing. The coercion-versus-compromise classification gives each entry the structural tag that determines which evidence signal will detect it. And runtime closure converts the document from a one-time exercise into a living catalog that re-opens at every operational event that changes the threat surface.

Walk these four steps against the highest-autonomy agent currently running in your cluster. Book a demo to see how cloud-native security for AI workloads closes the loop in production.

FAQ

How do I run this on an agent that’s already in production?

Skip nothing. Run Step 1 against the existing deployment — you will surface shadow components engineering didn’t catalog and tool surfaces that have grown since approval. Start the Step 2 observation window now; expect two to four weeks before behavioral closure. Run Steps 3 and 4 continuously thereafter, with re-opens triggered at every operational event the catalog defines.

How long until the threat model closes?

Inventory and reachability work is immediate — days of effort, scoped per agent. Behavioral closure runs two to four weeks per Deployment for routine workloads, longer for agents with rare-but-legitimate work patterns like monthly reports or quarterly batch processing. Per-pod baselines never converge because pod lifetime is shorter than the convergence period; baselines must be maintained at the Deployment level.

Who owns the threat model after Day 1?

Security owns the catalog. Platform engineering owns the closure signal feed — the kube-audit subscription, the eBPF sensor deployment, the framework SDK telemetry pipeline. Both own the re-open gate at every MCP connection, tool addition, or model update. A catalog owned by one team without the other is a catalog that stops closing the first time engineering ships faster than security reviews.

What about multi-agent orchestration?

Step 1 decomposition includes orchestrator and delegation edges as first-class components. Step 4 closure uses framework SDK telemetry — LangGraph state transitions, CrewAI delegation events, AutoGen speaker selections, MCP protocol messages. Single-agent baselines structurally cannot see cross-agent threats; the contagion lives in the spaces between agents, where one agent’s compromised output becomes another agent’s input.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest