Blog

Home
Blog
Why Editing IAM Policies Won’t Fix Your AI Agent Identity Problem

Why Editing IAM Policies Won’t Fix Your AI Agent Identity Problem

May 7, 2026

Ben Hirschberg
CTO & Co-founder

Key takeaways

Why is AI agent identity an architectural decision and not a credentials decision? The binding pattern you pick determines blast radius before any kernel or behavioral control fires. Service Accounts, IAM Roles, and Workload Identity Federation are sandboxing-architecture choices made at deployment time, not credential rotation work.
What breaks when standard least-privilege meets non-deterministic workloads? Least-privilege assumes you can enumerate what an identity needs at design time. AI agent action sets are bounded by prompts, tools, and inference outcomes — none of which are fixed before the agent ships.
What is the shared-identity collapse pattern? When teams reuse an existing role across multiple agent types because creating a new one is friction, the blast radius of every agent becomes the union of every agent's capabilities. Six months later this surfaces as an audit finding policy edits cannot fix.

Editing IAM policies cannot fix the most common architectural mistake in shipping AI agents on Kubernetes. It happens in thirty seconds: a platform engineer reuses an existing ServiceAccount with an IRSA annotation for Bedrock access because creating a new one takes thirty minutes plus a Terraform pull request. The new agent ships under the existing identity.

Six months later, a posture finding lands on the agent’s bound IAM role: excessive permissions, fifty-plus actions granted, four actually exercised. The team starts editing the policy, then notices that the analytics agent — running under the same role — needs three of the actions about to be removed. The reduction stalls; the role stays; the finding recurs at the next audit.

AI agents break the assumption every standard identity model rests on — that a workload’s access pattern is knowable at design time — which is why the binding decision made in thirty seconds becomes the architecture broken in month six. Identity scope on AI workloads is a sandboxing decision, and the right answer comes from one of four binding patterns, with a sub-agent credential question that has no default answer in any cloud provider’s documentation.

Why least-privilege fails when access patterns can’t be enumerated

The principle of least privilege, the trust policy attached to every IAM role, the RBAC binding inside every Kubernetes namespace — they all rest on a single assumption: the workload’s access pattern is knowable at design time. You enumerate the actions the workload performs, you grant exactly those, you deny everything else.

That assumption holds for deterministic workloads. A web service makes the same API calls every time it processes the same request shape. A batch job runs the same query against the same warehouse on the same cron schedule. The access pattern is bounded by the code.

It does not hold for AI agents. The same agent, same image, will call different APIs in different sequences depending on the prompt and the inference outcome — the action set is bounded by prompts × tools × inference, none of which are fixed at design time. ARMO CTO Ben Hirschberg has put it bluntly in the published analysis of cloud-native security for AI workloads: with AI agents you might end up with something you didn’t plan for, and therefore you’ll fail.

The architectural consequence: “least privilege” stops meaning “the privileges this workload needs” and starts meaning “the privileges this workload’s identity-binding pattern allows.”

How shared-identity collapse turns one role into a six-month audit problem

When provisioning a new agent’s identity is friction and reusing an existing one is not, teams reuse. The new fraud-detection agent inherits the analytics agent’s ServiceAccount because both run in the same Deployment; the customer-support agent that ships next quarter joins the same binding for the same reason. After three sprints, one IAM role serves four agent types.

The architectural consequence is mechanical. The blast radius of every agent in that role equals the union of every agent’s capabilities — the fraud-detection agent inherits the analytics agent’s write access to the customer-segment table, the payment-validation agent’s permission to call the settlement API, and the customer-support agent’s read scope on PII it never legitimately touches. A prompt injection against any one agent can exercise any capability granted to any of them.

This is the design-time decision that produces Inherited Overreach findings months later. ARMO’s published AI-SPM methodology separates permission excess into three categories: granted-but-never-exercised, exercised-but-not-for-legitimate-work, and the hardest one — excess that comes from a binding above the agent, not from the policy directly attached. Shared-identity collapse produces the third category by definition: the excess does not live on the agent’s own ServiceAccount, so the audit that starts there finds nothing.

Editing the role policy does not undo the architecture. Runtime-derived AI-BOM is what makes the collapse visible — it maps each agent’s observed identity usage to the declared bindings. Without that, the audit reports an excess on the role and stops there.

Four binding patterns: ServiceAccount-only, pod-federated, per-agent, per-session

Four binding patterns cover the production landscape. Three diagnostic questions route to the right one: does the agent need cloud APIs? does it share work patterns with other agents in the same Deployment? does its access scope need to vary per task within a single deployment?

Each pattern comes with a decision rule, a cloud-implementation specific, and a failure mode that surfaces if the wrong pattern is chosen.

Pattern A — Cluster ServiceAccount only

A vanilla Kubernetes ServiceAccount with no IRSA annotation, no Workload Identity Federation binding, no Azure federated credential. The pod authenticates to the Kubernetes API server with a bound projected token; it has no cloud identity.

Decision rule: the agent operates entirely in-cluster — talks to other Kubernetes services, queries an in-cluster vector database, never reaches a cloud API.

Failure mode: if the agent eventually needs cloud access and the binding is not updated, the cloud SDK falls back to the node’s identity — the EC2 instance role on EKS, the Compute Engine default service account on GKE Standard (often with project-Editor scope), the kubelet identity on AKS. One missed binding. The agent now runs with the entire node pool’s blast radius.

Pattern B — Pod-level federated identity

One ServiceAccount per pod, bound to a single cloud principal through OIDC token exchange. On EKS this is IRSA or EKS Pod Identity; on GKE, Workload Identity Federation with a KSA principal binding; on AKS, federated credentials on a Microsoft Entra app registration. Short-lived tokens, no static credentials in the cluster.

Decision rule: the agent needs cloud APIs, and the pod runs exactly one agent type with one work envelope. Per-cloud walkthroughs cover the configuration specifics for EKS and GKE.

Failure mode: shared-identity collapse. Multiple agent types reuse the binding because creating a new one is friction.

Pattern C — Per-agent federated identity

One IAM role per agent type, with the trust policy scoped through the OIDC sub claim — typically system:serviceaccount:<namespace>:<agent-name>. Each agent gets its own ServiceAccount and its own narrowly-scoped role. Neither role inherits the other’s capabilities. On EKS, this means separate IRSA roles with sub-claim conditions; on GKE, separate IAM grants on the KSA principal; on AKS, separate federated credentials per agent’s ServiceAccount.

Decision rule: multiple agent types run in the cluster, each with a distinct work envelope. This is the default for anything beyond a single-purpose deployment.

Why this is defensible at scale, not just policy proliferation: per-agent IAM scoping needs per-agent behavioral baseline behind it — otherwise every new agent restarts the policy-authoring problem from scratch. ARMO’s Application Profile DNA captures the per-agent scope envelope at the Deployment level from observed runtime behavior: deploy each agent under a broad observation-mode role, then tighten to what the agent actually used. This is observe-to-enforce applied at the IAM layer — per-agent guardrails at the kernel layer use the same principle one layer below.

Failure mode: operational drag without behavioral baselines. Teams without runtime observability authoring per-agent roles by hand revert to Pattern B within two sprints.

Pattern D — Per-session ephemeral credential

Pattern C is the parent; the agent uses its per-agent identity to issue itself a narrower credential per task. STS session policies on AWS, scoped sub-token issuance via OAuth 2.0 token exchange (RFC 8693) elsewhere. Each invocation runs with a credential scoped to the resources that invocation needs, and the credential expires when the invocation completes.

Decision rule: the agent’s access pattern shifts per task in a way that materially changes blast radius — one prompt drives a database query, another drives an external API call, another spawns a sub-process needing filesystem access.

Failure mode: operational complexity without proportional risk reduction. Without strong instrumentation, the credential narrows in theory but the agent’s behavior may not.

Trade-off matrix across the four patterns

Pattern	Blast radius	Operational cost	Observability granularity	Regulatory traceability
A — Cluster ServiceAccount only	In-cluster only — escalates to node identity if mis-bound	Low — vanilla SA	None at the cloud-IAM layer	None — no cloud principal to attribute against
B — Pod-level federated	Equals the role’s union scope across all agents in the pod	Low — one binding per pod	Per-pod, not per-agent	Per-pod attribution; ambiguous when agents share
C — Per-agent federated	Equals the agent’s own narrowly-scoped role	Medium — one binding per agent type, automatable	Per-agent — each agent maps to its own principal	Clean per-agent attribution in audit logs
D — Per-session ephemeral	Equals the task’s specific resource set, expires per session	High — runtime credential issuance + verification	Per-task — each invocation traceable separately	Per-task attribution with token-exchange chain

Pattern C is the right default for anything beyond a single-purpose deployment. Pattern D is the right answer when access shifts per task and instrumentation can verify the scoping.

Sub-agent credentials: three propagation patterns when one agent spawns another

The four-pattern decision tree solves the binding question for a single agent. It does not solve what arrives the moment the parent invokes a CrewAI delegation, an MCP tool server, or an AutoGen group-chat handoff: what identity does the child run under? Three propagation patterns are in production use, each differing in where the credential comes from and what the audit trail attributes the action to.

Propagation A — Token reuse

Token reuse is the path of least resistance and rarely the right answer in production. The child inherits the parent’s environment, including the parent’s bound credentials, runs with the parent’s identity and full capability set, and the audit trail attributes everything the child does to the parent.

A compromised tool server invoked by the parent now operates with the parent’s full IAM scope — and the audit log shows the parent doing things the parent never directly did.

Propagation B — Fresh per-child binding

Fresh per-child binding gives best isolation at the highest ops cost — the child gets its own ServiceAccount and IAM role at spawn time, which means identity provisioning has to happen at runtime. The audit trail attributes the child’s actions to the child’s identity, with the parent in orchestration metadata only. Pick this pattern when the child needs distinct regulatory traceability: sub-agents handling distinct regulated data classes, different patient records, or different customer tenants.

Propagation C — Scoped sub-token issuance

Scoped sub-token issuance is the right answer in the common case — the middle ground between token reuse and fresh per-child binding. The parent issues a narrowed token to the child via OAuth 2.0 token exchange (RFC 8693), STS with session policies, or the cloud-native equivalent. The child runs with a strict subset of the parent’s capabilities, scoped to the specific delegated task. The audit log attributes the child’s actions to a derived identity that traces back to the parent through the exchange record.

Multi-agent contagion travels through propagation gaps — when token reuse is the default, every prompt injection against any agent in the orchestration becomes a privilege-escalation primitive against the entire system.

Picking the binding before the agent ships

The decision rule that beats path-of-least-resistance: default to per-agent federated identity (Pattern C). Reach for pod-level shared (B) only when sharing is justified; never default to cluster-ServiceAccount-only (A) for any agent that touches a cloud API; reach for per-session ephemeral (D) only when access shifts per task and instrumentation can verify the scoping holds.

For sub-agents: token reuse is the wrong answer in production. Pick fresh per-child binding when attribution matters; pick scoped sub-token issuance otherwise.

The binding-pattern decision compounds: the audit at month six and the incident at minute one are both downstream of the choice made before the agent shipped.

To see how ARMO produces per-agent runtime enforcement on Kubernetes — Application Profile DNA at the Deployment level, eBPF-based behavioral baselining at 1–2.5% CPU overhead, and observe-to-enforce at the IAM layer — book a demo or see ARMO’s platform for cloud-native AI workload security.

Frequently asked questions

Should every AI agent get its own ServiceAccount?

Yes for any agent that touches a cloud API or has a distinct work envelope. The per-agent ServiceAccount is the prerequisite for per-agent IAM scope, and it is meaningful only with per-agent behavioral baseline behind it.

Is per-session ephemeral credential always the safest pattern?

It produces the narrowest blast radius at the highest operational cost. Reach for it when access shifts per task and instrumentation can verify the scoping holds — without that, the credential narrows in theory but the agent’s behavior may not.

How do I handle credentials when an agent spawns a sub-agent?

Three propagation patterns. Token reuse loses attribution and is rarely correct in production. Fresh per-child binding gives clean attribution at the cost of runtime identity provisioning. Scoped sub-token issuance — parent issues a narrowed token via OAuth 2.0 token exchange or its cloud equivalent — preserves the attribution chain and is the right answer in the common case.

What’s the right starting binding pattern at deployment time?

Per-agent federated identity (Pattern C) with a broad observation-mode role, then tighten from observed behavior. Never start with cluster-ServiceAccount-only or pod-shared. Observe-to-enforce at the IAM layer produces a defensible per-agent scope from evidence rather than authoring.

Does Workload Identity Federation alone solve the AI-agent identity problem?

WIF solves credential issuance — short-lived tokens, no static secrets, OIDC-based trust. The binding-pattern decision sits one layer above WIF and determines blast radius. WIF can issue tokens to a cluster-wide ServiceAccount serving four agent types just as easily as to one ServiceAccount per agent — the mechanism is correct in both cases; the architecture decides whether you generate Inherited Overreach findings later.

May 7, 2026

Privacy and Data Residency for AI Agents: What GDPR Requires That Static Controls Can’t Show

The residency evidence GDPR and the EU AI Act now expect lives in the runtime...

Yossi Ben Naim

VP of Product Management

May 6, 2026

AI Agent Incident Response in Cloud-Native Environments: A Playbook for Modern SOCs

It’s 2 a.m. and the SOC has a Tier 3 page. A customer-service agent on...

Shauli Rozen

CEO & Co-founder

May 6, 2026

AI Agent Security Performance: Framework for Evaluating Latency, Throughput, and Observability Overhead

Every AI workload security PoC reaches the same conversation. Platform engineering pushes back: the AI...

Ben Hirschberg

CTO & Co-founder