Sandboxing AI Agents on AKS: Network Policies, Workload Identity, and Least Privilege
Your AI agent runs on AKS with a managed identity that can read Azure Key...
Apr 24, 2026
Your CIEM report came back clean this morning. Every AI agent in the cluster is exercising its granted permissions — no idle roles, no service accounts with broad scope and a handful of API calls behind them, nothing that looks obviously over-provisioned. The dashboard is green, and by the diagnostic your tool was built on, it should be.
Last night, an analytics agent that’s been running scheduled reports over your data warehouse for four months ran one query it had never run before. The query pulled from a customer PII table the agent’s service account had access to — the role was roles/bigquery.dataViewer on the project, which covered the analytical dataset and, incidentally, every other dataset in the project. The agent had never touched the PII table. It wasn’t instructed to. It was prompted — by a retrieved document sitting in its RAG context — to “include customer identifier joins for the quarterly summary.” It followed the instruction. The IAM check passed. Fifty thousand rows left the warehouse.
This morning, that query shows up in your CIEM report as permission exercised, which is the tool’s definition of not-excessive. The report is still clean. The breach is real.
This is the pattern Ben Hirschberg, ARMO’s CTO, pointed to on a call with me last month: “whether this is a security issue, it depends on what this workload really needs, what’s the purpose of this workload.” For a deterministic workload, you can answer that question from code review — the code tells you what the workload needs. For an AI agent, you can’t. The agent’s needs are decided at inference time, by prompts nobody fully controls, against a tool catalog that shifts with every model and framework update.
If you’ve started mapping your AI security posture management work onto the three-discipline view — model and artifact posture, identity and access posture, behavioral posture — this piece is the operational deep-dive inside identity and access posture. Specifically: how to classify the permission-excess findings your assessment surfaces, and what each category actually requires to fix.
The thesis is this: “excessive permissions in AI workloads” is not a version of the CIEM problem with an AI label. It’s three distinct problems. The diagnostic CIEM was built on catches one of them cleanly, catches a second one partially, and can’t see the third at all.
Start with what CIEM assumes. A classic CIEM tool walks the identity graph for every principal — human or workload — and compares the permissions the principal has been granted against the permissions it has actually used over a configurable observation window. The delta between granted and used is excess. Scope down the role to eliminate the delta. Repeat on a cadence.
That diagnostic holds for deterministic workloads because the “used” set is bounded by the code. A payment processing microservice that calls three APIs today will call the same three APIs tomorrow, and the same three next month. The observation window captures the full set of legitimate needs in a short time, because the workload’s needs don’t expand after deployment.
An AI agent’s “used” set isn’t bounded by the code. It’s bounded by the intersection of granted permissions, tool availability at inference time, and the prompts the agent receives — which are themselves bounded by whatever users, documents, and upstream agents send into the model’s context window. This is the structural property the OWASP Top 10 for Agentic Applications catalogs as the root of most agent-specific threat categories: non-determinism that expands the threat surface beyond what configuration review can predict. The set of legitimate needs expands, contracts, and shifts with every model update, framework upgrade, RAG index refresh, and new class of user query. What you observed in week one isn’t the ceiling. It’s often not even a reliable floor.
This creates a specific failure mode: an agent can exercise every permission it’s been granted over your observation window, and still be excessively permissioned, because some of those exercises weren’t justified by legitimate work. The declared-vs-used calculation marks the permission as used and keeps it. A classic CIEM tool would close the finding. The breach scenario from the opening is an instance of that failure — and is a textbook case of the excessive agency risk ranked as LLM08 in OWASP’s original LLM application threat catalog.
This isn’t a failure of CIEM tooling. CIEM products do what they were designed to do, and do it well. It’s a failure of the diagnostic assumption CIEM was designed on. The assumption — that a stable observation window captures the workload’s legitimate needs — holds for most of your cluster and breaks for AI agents specifically.
Ben’s framing from the internal interview is the cleanest way to see the break: “You can sometimes say that application logic will never take a specific route. It will never use a specific privilege. In the case of AI agents, you might end up with something that you didn’t plan for and therefore you’ll fail. You need to lock it down much better than an application.” For AI workloads, the lock-down has to come from a different diagnostic than the one CIEM runs.
What follows is that diagnostic, broken into the three categories it was built to handle. The table below is the shortest version of it; the sections that follow walk each category in depth.
| Category | Definition | Identification method | Reduction path | Failure mode if misclassified |
|---|---|---|---|---|
| 1. Unused Excess | Permission granted to the agent, never exercised over a representative observation window. | Declared-vs-observed delta at the Kubernetes Deployment level, aggregated across the agent’s pod fleet. | Replacement IAM policy scoped to observed actions; STS or equivalent session policies for per-task constraints; NetworkPolicy and seccomp generation from observed behavior. | Applied to Category 2 or 3: the scope-down changes nothing because the permission was being exercised, or the inheritance chain re-grants it within hours. |
| 2. Unjustified Use | Permission granted and exercised, but the exercise wasn’t tied to a legitimate prompt-to-action chain covered by the agent’s declared work. | Cross-layer correlation linking IAM event → tool call → prompt context; deviation scoring against the per-agent behavioral baseline. | Per-agent behavioral guardrails at the enforcement layer; prompt-context constraints filtering what reaches the model’s context window. The IAM role stays where it is. | Applied as Category 1: the IAM scope-down breaks the agent’s legitimate work, the policy gets rolled back, the excess returns. |
| 3. Inherited Overreach | Effective scope descends from a role binding, cluster role, or node service account outside the agent’s own provisioning. | Identity chain trace from the agent’s service account back to every policy that contributes to its effective scope, across providers if multi-cloud. | Per-agent identity binding (per-agent IRSA, Workload Identity, or equivalent); non-human identity hygiene with ephemeral credentials and short TTLs. | Applied as Category 1: the policy edit on the agent’s own service account leaves upstream bindings untouched; effective scope is unchanged. |
The first category is the classical case. A permission has been granted to an agent’s service account, and the agent has never exercised it over a reasonable observation window. The grant is idle. The fix is to scope down the role.
Identification in this category is a declared-vs-observed comparison, but with one AI-specific adjustment: the observation has to happen at the Kubernetes Deployment level, not the per-pod level. AI agent pods are ephemeral — rolling restarts, autoscaling events, model updates, and scheduled redeployments mean the per-pod observation window is typically measured in hours. Baseline methodology for AI workloads handles this by building behavioral profiles at the Deployment level, so the observation window survives pod lifetimes and aggregates across the agent’s fleet. Without this, the declared-vs-observed calculation resets every time the pod restarts and never converges.
A concrete walkthrough. An agent with s3:* on a specific bucket runs for 14 days. The behavioral profile captures every API call the agent actually makes. The comparison shows the agent uses s3:GetObject and s3:ListBucket, and nothing else. The role is then rewritten to permit those two actions on the specific prefix the agent reads from, and the rest of s3:* is dropped. The observe-to-enforce methodology covers how to roll out the replacement policy without breaking the agent in production; the progressive enforcement guide walks the full sandboxing workflow that Category 1 fixes plug into.
Reduction paths for Category 1 are straightforward and familiar: replacement IAM policies generated from observed actions, NetworkPolicy generation from observed egress traffic, session policies scoped to per-task constraints, seccomp profiles generated from observed syscalls. Each of these maps onto existing cloud-native primitives — the Category 1 work is essentially CIEM plus progressive enforcement done well.
The failure mode for Category 1 only shows up when the diagnostic is applied to Category 2 or Category 3 findings. When a Category 2 finding is treated as Category 1, the scope-down changes nothing because the permission was being used. The team clears the finding, the excess remains. When a Category 3 finding is treated as Category 1, the team edits the policy on the agent’s own service account, leaves the inherited grant upstream untouched, and the effective scope recovers within hours as the inheritance chain re-propagates.
Neither of those failure modes is a CIEM problem. They’re classification problems — the Category 1 fix being applied where the category is wrong. Which is why the taxonomy matters more than any single diagnostic.
The second category is where most AI-specific breach paths start. The permission has been granted. The permission has been exercised. But the exercise wasn’t justified by the agent’s legitimate work scope — it happened because the agent’s prompt context directed it to, and the prompt direction came from somewhere that shouldn’t have been steering the agent’s tool choices.
The opening scenario of this article is a Category 2 case. The analytics agent had roles/bigquery.dataViewer at the project level. Every individual component of the breach path was “authorized”: the IAM check passed, the tool call was in the agent’s advertised toolset, the query was syntactically valid, the destination was an allowed internal egress path. Nothing in any of those individual events was anomalous. The malice was in the combination — the prompt context that steered the tool call to a table outside the agent’s routine work, invoked by an instruction the engineering team never anticipated would sit in a retrieved document. MITRE ATLAS catalogs this pattern as adversarial tool use through legitimate interfaces; it’s the category of attack that leaves no fingerprint in the IAM layer because the IAM layer never had the context to see it.
Identifying Category 2 requires three correlated signals, and no CIEM tool has access to more than one of them. The first signal is the IAM event itself — which principal called which API on which resource (CIEM sees this). The second is the agent’s tool call that triggered the IAM event — which tool the agent chose to invoke and with what parameters (a layer higher than CIEM operates). The third is the prompt context that produced the tool call — the user message plus retrieved content plus any upstream agent output that landed in the model’s context window at the moment the tool was selected (a layer higher still). Without all three, you cannot distinguish “agent using its permission for its job” from “agent using its permission for something its job never covers.”
This is the specific capability gap that runtime AI workload security tools are designed to close. Cross-layer correlation — linking the kernel-level API event to the application-layer tool call to the prompt chain that produced it — is what makes a Category 2 finding identifiable at all. ARMO’s CADR platform correlates exactly these signals: eBPF sensors capture the syscall and network events at the kernel level, Application Profile DNA baselines the per-deployment tool-call patterns, and the correlation engine ties those to the prompt and retrieval events in the application layer. The finding that surfaces isn’t “excessive permission on the service account.” It’s “permission exercised through a tool call whose triggering prompt falls outside this agent’s established work envelope.”
The reduction path for Category 2 is not an IAM edit. If the permission is removed, the agent breaks on its legitimate work — the BigQuery permission was needed for the analytical dataset; removing it scopes down both the breach path and the normal path. The correct reduction operates at the behavioral enforcement layer, not the IAM layer. Per-agent guardrails constrain which tool calls the agent can make under which behavioral envelope, and prompt-context constraints filter which retrieved content can reach the model’s context window. The IAM role stays where it is; the enforcement moves up the stack to the layer where the actual malice lives.
The failure mode when Category 2 is treated as Category 1 is the most expensive of the three. The team edits the IAM policy to remove the abused permission, the agent breaks in production on its legitimate work, the policy gets rolled back, the excess returns. In our experience working with security teams, this cycle sometimes runs two or three times before the team realizes the problem isn’t the policy.
The detection counterpart to this category — tool misuse at runtime — operates on the same signal correlation from the detection side. Category 2 at the posture layer is what that detection pipeline is monitoring for at the runtime layer. The malice in both is not in the action itself; it’s in the context that produced the action.
The third category isn’t a policy problem. It’s an architecture problem that policy edits can’t touch. IBM’s 2025 breach report found 97% of AI-related breaches involved systems lacking proper access controls — and Category 3 is the subset of that statistic that configuration audits routinely miss, because the access control failure doesn’t live on the agent’s own service account where every audit starts.
AI agents on Kubernetes inherit effective permissions through layered identity mechanisms: the pod’s service account, the namespace-scoped cluster role it’s bound to, any transitive role assumptions the service account can perform, and — on misconfigured clusters — the node service account the pod falls back to when Workload Identity or IRSA isn’t configured correctly. Each of these was provisioned for a reason. Almost none of them were provisioned with autonomous agent decision-making in mind.
The canonical pattern: a developer needed to debug an agent that kept failing against the analytical warehouse. They bumped the agent’s service account binding to a broader role to unblock the debugging session, planning to scope it back down afterward. The sprint ended, the binding stayed. A month later, the agent’s effective scope still includes the broader role, and no IAM audit on the agent’s own service account will reveal the excess — the excess lives in the inherited binding above the service account, not in any policy directly attached to it.
Identifying Category 3 requires tracing the identity chain from the agent’s pod back to every policy that contributes to its effective scope. For an agent on EKS, the trace walks pod → service account → (via IRSA) IAM role → attached policies and trust relationships. The EKS sandboxing reference walks through IRSA and Pod Identity patterns for this, and AWS recently introduced IAM context keys for managed MCP servers that can differentiate agent-initiated API calls from human-initiated ones at the IAM layer — useful for the portion of the chain running against AWS-managed MCP. On GKE, the walk is pod → service account → (via Workload Identity Federation) Google Cloud principal → project, dataset, and bucket-level roles, with the node SA fallback as a separate failure mode that bypasses the chain entirely; the GKE reference covers that fallback pattern and how to prevent it. On AKS, the equivalent chain runs through Workload Identity Federation and federated credentials on top of the Azure AD principal graph.
The reduction path for Category 3 is per-agent identity binding. The pattern that works: every agent gets its own service account, bound to its own narrowly-scoped role, with no namespace-wide role bindings contributing to its effective scope. The per-agent binding eliminates the inheritance chain by definition — there’s nothing to inherit from if the agent’s service account is the only principal in the graph with any policy attached.
Non-human identity hygiene is the companion practice. Ephemeral credentials with short TTLs, session policies that narrow per-task, and regular rotation of the service account itself. These reduce the window during which an inherited grant, if one sneaks back in through a future provisioning mistake, can be exploited before it’s caught.
The failure mode when Category 3 is treated as Category 1 is that the team edits the policy directly attached to the agent’s service account, feels satisfied that the excess is resolved, and misses that the effective scope is still broader because of an inherited binding they didn’t look at. The effective scope doesn’t change. The finding recurs on the next audit, often classified as a new Category 1 finding, and the cycle repeats.
The taxonomy isn’t a one-time audit. Permissions change, roles shift, agents get redeployed against updated frameworks, and new RAG indexes land in production every week. The NIST AI Risk Management Framework names continuous monitoring as a core governance practice for AI systems in production for exactly this reason — the posture you had at go-live is not the posture you have six weeks later. The cadence is continuous: running the three-category classification against the findings queue as part of the same posture cycle your team already runs for cloud workloads, with the classification step added on top.
The ordering that tends to work is Category 1 first, Category 3 second, Category 2 last. The logic isn’t arbitrary. Category 1 is the cheapest to fix and clears out the largest volume of findings — which is useful because it also narrows the set of candidates for Category 2 and Category 3 classification. Category 3 comes second because it addresses the architectural pattern that re-seeds Category 1 findings; if you skip Category 3, your next audit will show the same Category 1 findings reappearing through inheritance, and you’ll spend the same effort closing them twice. Category 2 comes last because it needs the most operational maturity — behavioral baselines have to be stable, prompt-context correlation has to be instrumented, and the team has to have enough operational experience to distinguish anomalous-but-legitimate agent behavior from anomalous-and-malicious.
Findings reclassify as the investigation deepens. A Category 1 finding often reveals a Category 3 root cause once you trace the chain — the permission was never exercised, not because the agent doesn’t need it, but because the inherited binding keeps re-granting it beyond what the role on the service account itself permits. A Category 2 finding sometimes resolves into a Category 3 problem when the prompt-context analysis reveals that the “unjustified use” was actually enabled by an inherited grant the team didn’t know was in scope. Keeping the classification as a living tag on the finding, rather than a once-assigned label, is what lets the discipline operate at the speed your environment changes.
The swap-test defense is worth making explicit. This framework doesn’t work if you replace “AI agent” with “microservice” throughout. Category 2 assumes prompt-to-tool correlation as the identification mechanism, and deterministic microservices don’t have prompts that drive their behavior. Category 3 assumes autonomous decision-making as the factor that makes inherited scope dangerous — a deterministic microservice inheriting the same scope is a constrained problem because the code bounds what the service will do with the inherited grant. Both categories are structurally AI-specific. That’s not a marketing claim; it’s the reason the taxonomy exists as a separate framework from traditional CIEM discipline.
Integration with existing CIEM tooling is additive, not substitutive. CIEM handles Category 1 fluently and is the right tool for that layer. The taxonomy sits on top of CIEM output, classifying each finding into one of the three categories and routing it to the appropriate reduction path. For teams building the broader evaluation around this — including the runtime-informed posture test and the four capability pillars — the AI workload security buyer’s guide covers the full vendor-evaluation framework this taxonomy fits into. At your go-live approval gate — the one-time checkpoint before an agent moves to production — the declared-vs-observed reconciliation that’s become standard practice is one piece of evidence, primarily a Category 1 and partial-Category 2 check. The three-category taxonomy is what runs continuously after go-live and catches the excess that accumulates through the agent’s operational lifetime.
The operational shift is small and important. In a CIEM-only posture discipline, every finding is a single type with a single fix pattern — reduce the delta between granted and used. The classification work happens implicitly, if at all, and usually only surfaces when the fix doesn’t work or breaks the agent.
In a three-category discipline, the classification step happens first and explicitly. Each finding gets assigned to Category 1, 2, or 3 before anyone reaches for a fix. The fix is then specific to the category: IAM scope-down for Category 1, behavioral guardrails and prompt-context constraints for Category 2, identity chain restructuring for Category 3. The failure modes when a category is misclassified are documented well enough that a misroute is recoverable — the finding comes back unsatisfied and the team can reclassify rather than escalate.
The artifact a security team walks away with is different too. Not a list of findings sorted by severity, but a classified queue where each item has a known reduction path and a known failure mode for misclassification. The same queue serves as an internal audit artifact: an auditor, internal or external, can see the classification reasoning alongside the fix, which is a posture of defensibility the flat severity-sorted list doesn’t provide.
The paradox that opened this article — the agent that exercises every permission and is still overprivileged — is a Category 2 finding. The taxonomy identifies it; the behavioral guardrail fix reduces it. The CIEM report, which reported the permission as used, was answering a different question than the one that needed answering.
If you want to see what Category 2 findings look like in a real cluster — where prompt-to-tool correlation surfaces excess that CIEM can’t see — the ARMO platform for cloud-native AI workload security combines runtime-derived AI-BOM, per-agent behavioral baselines, and cross-layer correlation into the posture layer this taxonomy depends on. The platform runs alongside your existing CIEM tooling; it doesn’t replace it. If you’d like a walkthrough of how the three-category classification would apply to your environment specifically, book a demo.
How do I classify an existing CIEM finding into one of the three categories?
Run a simple triage sequence. First, check whether the permission has been exercised over the observation window — if no, the finding is Category 1 and the standard CIEM fix applies. If yes, check whether the exercises trace to legitimate tool calls in prompt contexts that match the agent’s documented work scope — if not, it’s Category 2. If the finding traces back to a role binding outside the agent’s own provisioning, it’s Category 3 and requires identity chain work, not a policy edit.
What observation window do I need before the taxonomy produces reliable output?
It depends on the category. Category 1 needs 7 to 14 days of Deployment-level behavioral data for a reliable declared-vs-observed calculation, assuming the agent receives representative traffic in that window. Category 2 needs a mature prompt corpus — typically one to two weeks of production traffic plus a behavioral baseline established under that load. Category 3 is deterministic from day one; it’s a static trace of the identity graph and doesn’t require any observation window at all.
Can Category 2 identification be automated, or does it require human review?
The signal correlation is automatable — the prompt-to-tool-call-to-API linkage and the baseline deviation scoring can run without human input. Human review enters at the classification step once the behavioral profile has matured, because the review question isn’t “is this anomalous” (the baseline answers that) but “is this anomalous exercise of a permission outside the agent’s legitimate job.” That judgment call is where a security engineer’s context about the agent’s intended purpose matters more than the signal itself.
How does this interact with our existing go-live approval process?
Go-live is a single checkpoint; continuous posture is a cadence. At go-live, a declared-vs-observed reconciliation captures a snapshot — useful as a Category 1 and partial-Category 2 check, but not continuous, so the Category 2 and Category 3 findings that accumulate after go-live aren’t in scope for the gate. The three-category taxonomy is what runs post-approval and catches the excess that accumulates through the agent’s operational lifetime.
What changes for multi-cloud agents whose identity chain spans providers?
Category 3 becomes the dominant finding type. An agent that crosses provider boundaries inherits from each provider’s identity constructs — IRSA on EKS, Workload Identity on GKE or AKS, federated credentials between them — and the identity chain trace has to resolve each provider’s graph independently and then reconcile them against the agent’s effective cross-cloud scope. The reduction path stays the same (per-agent identity binding per cloud), but the architectural complexity is larger and the audit of the cross-cloud chain becomes a continuous task rather than a one-time design decision.
Your AI agent runs on AKS with a managed identity that can read Azure Key...
For six weeks, a mid-size hospital system’s CDS agent issued recommendations biased by a poisoned...
A healthcare CISO opens her AI-SPM dashboard at the start of the quarter. Every clinical...