Get the latest, first
arrowBlog
How to Sandbox AI Agents on EKS: Where Each AWS Control Stops and What Fills the Gap

How to Sandbox AI Agents on EKS: Where Each AWS Control Stops and What Fills the Gap

Apr 1, 2026

Ben Hirschberg
CTO & Co-founder

Key takeaways

  • What makes sandboxing AI agents on EKS different from sandboxing traditional workloads? Every EKS primitive—IRSA, SecurityGroupPolicy, NetworkPolicy, seccomp—assumes the workload will behave the same way tomorrow as it did today. AI agents violate that assumption because their behavior changes with every prompt.
  • Why can’t I write effective seccomp profiles for AI agents before deploying them? Because AI agents—especially those with code generation capabilities—invoke syscalls that depend on runtime prompts. A LangChain agent that generates and executes Python code will make different syscalls depending on what the user asks.
  • What’s the most dangerous gap in EKS-native sandboxing for AI agents? None of the native controls can detect misuse of granted permissions. An agent that exfiltrates data through a legitimate API call it’s authorized to make looks identical to normal operation in CloudTrail, VPC Flow Logs, and SecurityGroupPolicy enforcement. The gap is behavioral—knowing what “normal” looks like for each specific agent—and closing it requires runtime behavioral enforcement at the kernel level.

EKS gives you more sandboxing primitives for AI agent workloads than any other managed Kubernetes platform: IRSA for identity scoping, EKS Pod Identity for simplified role mapping, SecurityGroupPolicy for pod-level VPC segmentation, native NetworkPolicy enforcement through the VPC CNI’s eBPF engine, seccomp profiles for syscall restriction, and VPC endpoints for private service access. For a traditional microservice with deterministic behavior, these controls compose into a strong least-privilege stack.

For AI agents, every one of them has a blind spot in the same place. They enforce rules based on what you declared at deploy time, not what the agent decides at runtime. An IRSA role scoped to three S3 buckets cannot tell whether the agent is reading customer data for a legitimate support query or exfiltrating it after a prompt injection—the kind of excessive agency risk ranked among the top threats worldwide. A NetworkPolicy that permits egress to your vector database endpoint cannot tell whether the agent is running a normal RAG retrieval or dumping its entire context window. A seccomp profile that allows the connect syscall cannot distinguish a legitimate API call from an unauthorized outbound connection.

This guide is the EKS-specific configuration reference for the progressive enforcement methodology covered in the complete sandboxing guide. That guide explains the four-stage approach—discovery, observation, selective enforcement, full least privilege. 

The EKS framework implementation guide shows how the Observe → Posture → Detect → Enforce methodology maps to EKS primitives. This article goes one level deeper: how to actually configure each control for AI agent workloads, and the exact point where each one goes blind because agent behavior is non-deterministic.

Three control layers, each with EKS-specific configuration patterns. Three corresponding failure modes, all rooted in the same architectural gap. And a progressive deployment sequence that rolls the full stack out without breaking production.

Identity Controls: IRSA, Pod Identity, and Session-Level Constraints

IRSA vs. EKS Pod Identity for Agent Workloads

Both mechanisms map a Kubernetes service account to an IAM role so your agent pod gets temporary AWS credentials without long-lived secrets. The differences are operational. IRSA requires an OIDC provider tied to your EKS cluster, trust policies that reference the provider ARN and service account, and manual management of role bindings. It works across EKS, EKS Anywhere, and self-managed clusters. EKS Pod Identity eliminates the OIDC provider entirely—role-to-service-account mappings are managed through the EKS API using a DaemonSet-based credential agent. Setup is simpler, cross-account role assumption is built in, and you avoid the IAM trust policy size limits that IRSA hits at scale. The trade-off: Pod Identity is EKS-only and requires the Pod Identity Agent add-on.

For AI agent sandboxing, the choice between them matters less than the workflow you build around either one.

The Observation-Mode Role Pattern

Start with a role that permits the broad API set your agent might need: bedrock:InvokeModel for inference, s3:GetObject for RAG data, dynamodb:Query for session state, secretsmanager:GetSecretValue for credential access. This is deliberately broader than you want long-term—the point is to avoid breaking the agent during the observation period.

Over one to two weeks, CloudTrail captures every API call the agent makes. Filter by the agent’s role ARN and look at AssumeRoleWithWebIdentity events for IRSA or AssumeRoleForPodIdentity events for Pod Identity to see how often the agent assumes its role. Then examine the downstream API calls: which services, which specific resources (S3 bucket ARNs, DynamoDB table names), which actions. If your role permits 47 APIs and the agent calls three, that 47-to-3 ratio is your posture gap—and the basis for your replacement least-privilege policy. AWS recently introduced IAM context keys for managed MCP servers that can differentiate agent-initiated API calls from human-initiated ones, but these only apply to AWS-managed MCP servers—self-deployed agents on EKS need the behavioral profiling approach described here.

Session Policies for Per-Task Constraints

For agents that perform different tasks depending on the prompt—querying a database for one request, calling an external API for another—STS session policies can narrow permissions per invocation. The underlying IRSA or Pod Identity role stays broad enough for all possible tasks, but each STS session is scoped to the specific task at hand. This is an underused pattern for AI agents: the orchestration layer requests a session with a policy that only permits the APIs needed for the current task, and the session expires when the task completes. Short-lived, task-scoped credentials limit the window during which stolen or misused credentials are useful.

Where Identity Controls Go Blind

IAM answers one question: is this identity allowed to call this API on this resource? It does not answer: is this call normal for this agent right now?

The specific failure mode: an agent that uses its permitted credentials to call an API it’s authorized for, but in a pattern that constitutes data exfiltration. Fifty thousand s3:GetObject calls when the baseline is five per hour. A bedrock:InvokeModel request with an unusually large payload after ingesting a suspicious prompt. IAM policies pass every one of these calls. CloudTrail records them as normal, authorized events. The calls are individually legitimate but collectively constitute a breach—the kind of AI-mediated data exfiltration that only cross-layer signal correlation can detect.

Closing this gap requires behavioral baselines built from observed runtime activity—not just which APIs are called, but the frequencies, payload sizes, and access volumes that define “normal” for each specific agent. ARMO’s Application Profile DNA captures exactly this per-agent behavioral profile, correlating CloudTrail API events with kernel-level runtime data to build a complete picture of what each agent actually does. When the pattern deviates, the CADR platform fires a detection with the full attack story—from triggering prompt through anomalous API sequence to data impact—rather than a pile of disconnected CloudTrail entries.

Network Controls: SecurityGroupPolicy, NetworkPolicy, and VPC Endpoints

EKS gives you two distinct network control mechanisms that operate at different layers and apply simultaneously. Understanding how they interact—and where each one stops—is critical for AI agent workloads that reach a wider and more dynamic set of endpoints than traditional microservices. The AWS EKS networking best practices cover the general configuration patterns; what follows is how those patterns apply—and where they break down—for AI agents specifically.

Security Groups for Pods (SecurityGroupPolicy CRD)

By default, every pod on an EKS node shares the node’s security groups. The VPC CNI’s SecurityGroupPolicy CRD changes this: you assign specific AWS security groups to individual pods based on label selectors. When a matching pod launches, the VPC Resource Controller provisions a dedicated branch ENI and attaches your specified security groups.

For AI agent workloads, this means you can create a security group that only allows HTTPS egress to bedrock-runtime.us-east-1.amazonaws.com and your internal vector database, then apply it specifically to your Bedrock-calling agent pods. Other pods on the same node keep their broader security groups. This is VPC-level micro-segmentation at the pod level—using the same security group rules your network team already understands.

One configuration detail that matters for AI agents: the POD_SECURITY_GROUP_ENFORCING_MODE setting on the VPC CNI. In strict mode (the default), only the branch ENI’s security groups apply to the pod’s traffic. In standard mode, security groups from both the primary ENI and the branch ENI apply—traffic must comply with both. For agents that need node-level baseline rules plus pod-specific restrictions, standard mode is typically the right choice. Security Groups for Pods also requires EC2-backed nodes—Fargate pods cannot get dedicated security groups, the same limitation that affects EKS runtime monitoring.

Kubernetes NetworkPolicy via VPC CNI

The VPC CNI supports native Kubernetes NetworkPolicy enforcement using eBPF since version 1.14—no third-party CNI required. NetworkPolicies control pod-to-pod and pod-to-external traffic at L3/L4. The standard pattern for security-sensitive workloads is default-deny egress with explicit allows for known destinations.

For AI agents, this pattern hits a practical wall. An agent that calls Bedrock for inference, queries a Pinecone instance for vector search, reaches an external SaaS API through an MCP tool integration, and connects to an internal microservice for business logic needs explicit egress rules for all four destinations. And which endpoints it calls can change based on the prompt—a new tool integration or a different RAG source means the NetworkPolicy needs updating. Static deny-all with explicit allows works perfectly for deterministic workloads. For agents, it requires continuous refinement based on observed traffic patterns.

How They Layer Together

This is the question neither AWS documentation nor existing security guides answer clearly: when both SecurityGroupPolicy and NetworkPolicy apply to the same AI agent pod, what controls what?

MechanismOperates AtBest For (AI Agents)Limitation
SecurityGroupPolicyVPC / ENI levelAccess to AWS services (Bedrock, S3, RDS) and VPC resourcesCannot control pod-to-pod traffic within the cluster
NetworkPolicy (VPC CNI)L3/L4 via eBPFPod-to-pod segmentation, egress to non-AWS endpoints (vector DBs, MCP servers)Cannot reference AWS security groups or service-level constructs
VPC EndpointsVPC routing layerKeeping AWS service traffic (Bedrock, SageMaker, S3) off the public internetNo per-pod granularity; applies to all traffic in the subnet

The practical pattern for AI agent workloads: use SecurityGroupPolicy to control which AWS services each agent can reach, NetworkPolicy to control cluster-internal traffic and egress to non-AWS endpoints, and VPC endpoints to keep AWS service traffic private. All three apply simultaneously—an agent’s traffic must satisfy all applicable rules.

Where Network Controls Go Blind

Both mechanisms enforce rules about where traffic can go. Neither inspects what’s being sent or why. An agent exfiltrating customer data to an allowed S3 bucket via a permitted VPC endpoint looks identical to a normal data write in every security group evaluation, every NetworkPolicy check, and every VPC Flow Log record. Flow Logs show that a connection happened. They cannot show that the agent was manipulated by a prompt injection attack into sending data it would never touch during normal operation.

ARMO’s runtime sensor closes this gap by correlating each network connection with the process and tool invocation that initiated it. Instead of seeing an anonymous outbound connection to api.pinecone.io:443, the platform attributes it to a specific RAG retrieval triggered by a specific tool call in a specific agent. That attribution transforms a list of IP addresses into an enforceable, evidence-based network policy. ARMO generates Kubernetes NetworkPolicy resources directly from observed traffic patterns—so the policy reflects what the agent actually does, updated as behavior evolves.

Process Controls: Seccomp Profiles for AI Agent Workloads

Why RuntimeDefault Isn’t Enough for AI Agents

EKS managed node groups running containerd apply the RuntimeDefault seccomp profile by default as of Kubernetes 1.25. RuntimeDefault blocks a solid set of dangerous syscalls—reboot, kexec_load, mount—but permits hundreds of others that deterministic workloads routinely need. The EKS runtime security best practices recommend seccomp for all production workloads; for AI agents, the question is how restrictive your profile can get.

For AI agents with code generation capabilities—what ARMO CTO Ben Hirschberg identifies as the most dangerous AI capability because it means running code that no human reviewed—RuntimeDefault is too permissive. An agent that generates and executes Python code could use execve for process creation, socket and connect for network access, and openat for arbitrary file reads—all permitted by RuntimeDefault, all potentially dangerous when driven by untrusted prompts rather than reviewed code. The MITRE ATLAS framework catalogs these execution-based attack techniques specifically because AI agents turn permitted system capabilities into unpredictable attack surfaces.

The Observe-Then-Profile Approach

You cannot write a seccomp profile for an AI agent from a deployment manifest. The manifest tells you what container image runs. It does not tell you what syscalls the agent will make, because those depend on runtime prompts.

The correct workflow on EKS: deploy the agent with RuntimeDefault seccomp, observe actual syscall behavior over a representative period using kernel-level instrumentation, generate a custom Localhost profile from observed syscalls, deploy the profile to worker nodes at /var/lib/kubelet/seccomp/ (distributed via a DaemonSet or the Security Profiles Operator), apply it to the agent pod’s security context, and run in audit mode first (SCMP_ACT_LOG for unmatched syscalls) before graduating to enforcement (SCMP_ACT_ERRNO or SCMP_ACT_KILL_PROCESS). This mirrors the observe-to-enforce workflow at the process control layer—visibility first, enforcement based on evidence.

The AI-Specific Seccomp Challenge

A LangChain agent that generates Python code will invoke different syscalls depending on the prompt. Monday’s traffic might trigger execve, clone, and socket for a data analysis task. Tuesday’s traffic might add openat for file I/O and additional connect calls for a new API endpoint. The observation period needs to be long enough to capture the agent’s full behavioral range across different prompt types and tool invocations. For most agents, seven to fourteen days captures a representative sample. Agents with particularly varied tool sets—those integrating multiple MCP servers or running diverse analytical tasks—may need longer.

This is why manual syscall enumeration does not scale for AI agent workloads. The Kubernetes Agent Sandbox CRD provides code execution isolation through gVisor or Kata Containers—controlling where untrusted code runs. Seccomp profiles complement this by controlling what syscalls the agent process can make. But both are static controls. Automated behavioral profiling—observing syscalls at the kernel level across the full range of agent operation, then generating the profile from that evidence—is the only approach that produces profiles tight enough to be useful and accurate enough to avoid breaking production.

Where Process Controls Go Blind

Seccomp restricts which syscalls the agent can make. It cannot distinguish why the agent is making them. An openat call to read a legitimate configuration file is the same syscall as an openat call to read /var/run/secrets/kubernetes.io/serviceaccount/token for credential theft. A connect to your authorized Bedrock endpoint is the same syscall as a connect to an attacker-controlled server—if both are in the seccomp allowlist, the profile cannot tell them apart.

Context-aware enforcement requires correlating the syscall with the agent’s application-layer behavior: what tool invocation triggered the call, what prompt preceded it, and whether the pattern matches the agent’s established baseline. ARMO generates seccomp profiles from observed agent behavior—capturing the full syscall range across representative operation periods—and deploys them in audit mode for validation before graduating to enforcement. Because the profile is derived from evidence rather than guesswork, the risk of breaking production drops dramatically compared to manually authored profiles.

Layering All Three Controls: A Progressive Deployment Sequence

Rolling out identity, network, and process controls simultaneously without breaking production requires sequencing. 

Week 1: Deploy with observation-mode controls. Broad IRSA or Pod Identity role that permits the full API set your agents might need. Default-allow NetworkPolicies (or no NetworkPolicies, which is the EKS default). RuntimeDefault seccomp. All telemetry flowing: CloudTrail capturing API calls, VPC Flow Logs recording egress, and runtime sensors (deployed as a DaemonSet on managed node groups) capturing syscalls, process trees, and network connections at the kernel level. ARMO’s runtime AI-BOM discovers agents within hours—identifying frameworks, models, tool integrations, and dependencies based on actual runtime behavior rather than manifests.

Week 2: Generate baselines and draft controls. Profile API usage from CloudTrail to draft the replacement IAM policy. Map egress destinations from Flow Logs and runtime sensor data to draft the SecurityGroupPolicy and NetworkPolicy set. Capture syscall behavior to draft the custom seccomp profile. Application Profile DNA builds the per-agent behavioral baselines that inform each control.

Week 3: Deploy controls in audit mode. Apply the replacement IAM policy in a parallel role for shadow testing. Deploy NetworkPolicies in logging mode using the VPC CNI’s policy enforcement logs to CloudWatch. Apply the seccomp profile in SCMP_ACT_LOG mode. Compare expected versus actual. Tune policies where the baseline missed legitimate edge-case behavior.

Week 4: Graduate to enforcement. Swap to the least-privilege IAM role. Switch NetworkPolicies to enforcement. Graduate seccomp to blocking mode. Maintain runtime monitoring for behavioral drift as models update and prompts change—enforcement is continuous, not one-time.

ARMO automates this progression end to end. The CADR platform, built on Kubescape, generates seccomp profiles, NetworkPolicy resources, and identity constraints from baseline data, deploys them in audit mode, monitors for false positives, and graduates to enforcement—all from a single control plane. Evidence-based least privilege at 1–2.5% CPU and 1% memory overhead, with zero code changes and no sidecars. For teams evaluating how this fits into their broader AI workload security stack, the buyer’s guide provides the complete four-pillar vendor evaluation framework, and the finance-specific evaluation guide maps native tool coverage against third-party runtime platforms.

Where Each Control Goes Blind: The Behavioral Gap

Every EKS-native sandboxing control shares the same structural limitation when applied to AI agent workloads. The table below summarizes where each one stops and what fills the gap.

ControlWhat It EnforcesWhat It Can SeeAI Agent Blind SpotWhat Fills the Gap
IRSA / Pod IdentityWhich AWS APIs the agent can callAPI call authorization (pass/deny)Cannot detect misuse of permitted APIs (normal call volume vs. exfiltration volume)Behavioral baselines on API call patterns per agent
SecurityGroupPolicyWhich VPC destinations the pod can reachConnection allowed/denied at security group levelCannot inspect payload or attribute connections to tool invocationsRuntime correlation of network connections to application-layer behavior
NetworkPolicy (VPC CNI)Pod-to-pod and pod-to-CIDR traffic at L3/L4Connection allowed/denied per policy ruleCannot adapt to dynamic agent egress without continuous updating from runtime dataAuto-generated policies from observed traffic patterns
Seccomp (RuntimeDefault)Which syscalls the process can makeSyscall allowed/blockedCannot distinguish legitimate vs. malicious use of the same syscallCustom profiles generated from observed agent syscall behavior

The pattern is consistent across all four controls: each one enforces static rules about what the agent is allowed to do, but none can assess whether what the agent is actually doing is normal for that specific agent. That assessment requires runtime behavioral intelligence—the layer that transforms static controls into adaptive, evidence-based enforcement. To see how this works in practice across your EKS environment, watch a demo of the ARMO platform.

Frequently Asked Questions

How do Security Groups for Pods interact with Kubernetes NetworkPolicies on EKS?

Both apply simultaneously to the same pod. SecurityGroupPolicy operates at the VPC/ENI level and is best for controlling access to AWS services like Bedrock, RDS, and S3. NetworkPolicy via the VPC CNI operates at L3/L4 through eBPF and is best for pod-to-pod traffic and egress to non-AWS endpoints. For AI agents, use SecurityGroupPolicy for AWS service boundaries and NetworkPolicy for cluster-internal segmentation and non-AWS external egress.

Can I use EKS Fargate for AI agent workloads that need sandboxing?

Fargate does not support DaemonSets, which means no runtime sensors for behavioral monitoring (ARMO or GuardDuty EKS Runtime Monitoring), no custom seccomp profile distribution via DaemonSet, and no Security Groups for Pods. If your AI agents need runtime behavioral monitoring or process-level enforcement, run them on EKS managed node groups or EKS Auto Mode.

How long should I observe AI agent behavior before generating enforcement policies?

Most agents produce usable behavioral baselines within seven to fourteen days. Agents with code generation capabilities or wide tool sets may need longer to capture the full range of legitimate syscall and network behavior. 

Should I use the VPC CNI’s native NetworkPolicy support or a third-party CNI like Cilium?

The VPC CNI’s native NetworkPolicy support, eBPF-based and available since version 1.14, eliminates the need for a third-party CNI for standard L3/L4 policy enforcement. Cilium adds L7 visibility and more advanced policy capabilities. For AI agent workloads, the choice depends on whether your runtime security platform already provides the application-layer context that Cilium would add—ARMO’s CADR platform, for example, provides that layer independently of the CNI.

What happens if an agent’s behavior changes after enforcement policies are set?

Behavioral drift is expected—models update, prompts change, new tool integrations ship. The enforcement workflow treats policies as living documents. Drift detection flags when an agent’s behavior deviates from its baseline so you can review the new behavior, validate it as legitimate, and update the baseline and policies accordingly. This continuous enforcement loop is a core principle of the progressive enforcement methodology.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest