The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Apr 1, 2026
EKS gives you more sandboxing primitives for AI agent workloads than any other managed Kubernetes platform: IRSA for identity scoping, EKS Pod Identity for simplified role mapping, SecurityGroupPolicy for pod-level VPC segmentation, native NetworkPolicy enforcement through the VPC CNI’s eBPF engine, seccomp profiles for syscall restriction, and VPC endpoints for private service access. For a traditional microservice with deterministic behavior, these controls compose into a strong least-privilege stack.
For AI agents, every one of them has a blind spot in the same place. They enforce rules based on what you declared at deploy time, not what the agent decides at runtime. An IRSA role scoped to three S3 buckets cannot tell whether the agent is reading customer data for a legitimate support query or exfiltrating it after a prompt injection—the kind of excessive agency risk ranked among the top threats worldwide. A NetworkPolicy that permits egress to your vector database endpoint cannot tell whether the agent is running a normal RAG retrieval or dumping its entire context window. A seccomp profile that allows the connect syscall cannot distinguish a legitimate API call from an unauthorized outbound connection.
This guide is the EKS-specific configuration reference for the progressive enforcement methodology covered in the complete sandboxing guide. That guide explains the four-stage approach—discovery, observation, selective enforcement, full least privilege.
The EKS framework implementation guide shows how the Observe → Posture → Detect → Enforce methodology maps to EKS primitives. This article goes one level deeper: how to actually configure each control for AI agent workloads, and the exact point where each one goes blind because agent behavior is non-deterministic.
Three control layers, each with EKS-specific configuration patterns. Three corresponding failure modes, all rooted in the same architectural gap. And a progressive deployment sequence that rolls the full stack out without breaking production.
Both mechanisms map a Kubernetes service account to an IAM role so your agent pod gets temporary AWS credentials without long-lived secrets. The differences are operational. IRSA requires an OIDC provider tied to your EKS cluster, trust policies that reference the provider ARN and service account, and manual management of role bindings. It works across EKS, EKS Anywhere, and self-managed clusters. EKS Pod Identity eliminates the OIDC provider entirely—role-to-service-account mappings are managed through the EKS API using a DaemonSet-based credential agent. Setup is simpler, cross-account role assumption is built in, and you avoid the IAM trust policy size limits that IRSA hits at scale. The trade-off: Pod Identity is EKS-only and requires the Pod Identity Agent add-on.
For AI agent sandboxing, the choice between them matters less than the workflow you build around either one.
Start with a role that permits the broad API set your agent might need: bedrock:InvokeModel for inference, s3:GetObject for RAG data, dynamodb:Query for session state, secretsmanager:GetSecretValue for credential access. This is deliberately broader than you want long-term—the point is to avoid breaking the agent during the observation period.
Over one to two weeks, CloudTrail captures every API call the agent makes. Filter by the agent’s role ARN and look at AssumeRoleWithWebIdentity events for IRSA or AssumeRoleForPodIdentity events for Pod Identity to see how often the agent assumes its role. Then examine the downstream API calls: which services, which specific resources (S3 bucket ARNs, DynamoDB table names), which actions. If your role permits 47 APIs and the agent calls three, that 47-to-3 ratio is your posture gap—and the basis for your replacement least-privilege policy. AWS recently introduced IAM context keys for managed MCP servers that can differentiate agent-initiated API calls from human-initiated ones, but these only apply to AWS-managed MCP servers—self-deployed agents on EKS need the behavioral profiling approach described here.
For agents that perform different tasks depending on the prompt—querying a database for one request, calling an external API for another—STS session policies can narrow permissions per invocation. The underlying IRSA or Pod Identity role stays broad enough for all possible tasks, but each STS session is scoped to the specific task at hand. This is an underused pattern for AI agents: the orchestration layer requests a session with a policy that only permits the APIs needed for the current task, and the session expires when the task completes. Short-lived, task-scoped credentials limit the window during which stolen or misused credentials are useful.
IAM answers one question: is this identity allowed to call this API on this resource? It does not answer: is this call normal for this agent right now?
The specific failure mode: an agent that uses its permitted credentials to call an API it’s authorized for, but in a pattern that constitutes data exfiltration. Fifty thousand s3:GetObject calls when the baseline is five per hour. A bedrock:InvokeModel request with an unusually large payload after ingesting a suspicious prompt. IAM policies pass every one of these calls. CloudTrail records them as normal, authorized events. The calls are individually legitimate but collectively constitute a breach—the kind of AI-mediated data exfiltration that only cross-layer signal correlation can detect.
Closing this gap requires behavioral baselines built from observed runtime activity—not just which APIs are called, but the frequencies, payload sizes, and access volumes that define “normal” for each specific agent. ARMO’s Application Profile DNA captures exactly this per-agent behavioral profile, correlating CloudTrail API events with kernel-level runtime data to build a complete picture of what each agent actually does. When the pattern deviates, the CADR platform fires a detection with the full attack story—from triggering prompt through anomalous API sequence to data impact—rather than a pile of disconnected CloudTrail entries.
EKS gives you two distinct network control mechanisms that operate at different layers and apply simultaneously. Understanding how they interact—and where each one stops—is critical for AI agent workloads that reach a wider and more dynamic set of endpoints than traditional microservices. The AWS EKS networking best practices cover the general configuration patterns; what follows is how those patterns apply—and where they break down—for AI agents specifically.
By default, every pod on an EKS node shares the node’s security groups. The VPC CNI’s SecurityGroupPolicy CRD changes this: you assign specific AWS security groups to individual pods based on label selectors. When a matching pod launches, the VPC Resource Controller provisions a dedicated branch ENI and attaches your specified security groups.
For AI agent workloads, this means you can create a security group that only allows HTTPS egress to bedrock-runtime.us-east-1.amazonaws.com and your internal vector database, then apply it specifically to your Bedrock-calling agent pods. Other pods on the same node keep their broader security groups. This is VPC-level micro-segmentation at the pod level—using the same security group rules your network team already understands.
One configuration detail that matters for AI agents: the POD_SECURITY_GROUP_ENFORCING_MODE setting on the VPC CNI. In strict mode (the default), only the branch ENI’s security groups apply to the pod’s traffic. In standard mode, security groups from both the primary ENI and the branch ENI apply—traffic must comply with both. For agents that need node-level baseline rules plus pod-specific restrictions, standard mode is typically the right choice. Security Groups for Pods also requires EC2-backed nodes—Fargate pods cannot get dedicated security groups, the same limitation that affects EKS runtime monitoring.
The VPC CNI supports native Kubernetes NetworkPolicy enforcement using eBPF since version 1.14—no third-party CNI required. NetworkPolicies control pod-to-pod and pod-to-external traffic at L3/L4. The standard pattern for security-sensitive workloads is default-deny egress with explicit allows for known destinations.
For AI agents, this pattern hits a practical wall. An agent that calls Bedrock for inference, queries a Pinecone instance for vector search, reaches an external SaaS API through an MCP tool integration, and connects to an internal microservice for business logic needs explicit egress rules for all four destinations. And which endpoints it calls can change based on the prompt—a new tool integration or a different RAG source means the NetworkPolicy needs updating. Static deny-all with explicit allows works perfectly for deterministic workloads. For agents, it requires continuous refinement based on observed traffic patterns.
This is the question neither AWS documentation nor existing security guides answer clearly: when both SecurityGroupPolicy and NetworkPolicy apply to the same AI agent pod, what controls what?
| Mechanism | Operates At | Best For (AI Agents) | Limitation |
| SecurityGroupPolicy | VPC / ENI level | Access to AWS services (Bedrock, S3, RDS) and VPC resources | Cannot control pod-to-pod traffic within the cluster |
| NetworkPolicy (VPC CNI) | L3/L4 via eBPF | Pod-to-pod segmentation, egress to non-AWS endpoints (vector DBs, MCP servers) | Cannot reference AWS security groups or service-level constructs |
| VPC Endpoints | VPC routing layer | Keeping AWS service traffic (Bedrock, SageMaker, S3) off the public internet | No per-pod granularity; applies to all traffic in the subnet |
The practical pattern for AI agent workloads: use SecurityGroupPolicy to control which AWS services each agent can reach, NetworkPolicy to control cluster-internal traffic and egress to non-AWS endpoints, and VPC endpoints to keep AWS service traffic private. All three apply simultaneously—an agent’s traffic must satisfy all applicable rules.
Both mechanisms enforce rules about where traffic can go. Neither inspects what’s being sent or why. An agent exfiltrating customer data to an allowed S3 bucket via a permitted VPC endpoint looks identical to a normal data write in every security group evaluation, every NetworkPolicy check, and every VPC Flow Log record. Flow Logs show that a connection happened. They cannot show that the agent was manipulated by a prompt injection attack into sending data it would never touch during normal operation.
ARMO’s runtime sensor closes this gap by correlating each network connection with the process and tool invocation that initiated it. Instead of seeing an anonymous outbound connection to api.pinecone.io:443, the platform attributes it to a specific RAG retrieval triggered by a specific tool call in a specific agent. That attribution transforms a list of IP addresses into an enforceable, evidence-based network policy. ARMO generates Kubernetes NetworkPolicy resources directly from observed traffic patterns—so the policy reflects what the agent actually does, updated as behavior evolves.
EKS managed node groups running containerd apply the RuntimeDefault seccomp profile by default as of Kubernetes 1.25. RuntimeDefault blocks a solid set of dangerous syscalls—reboot, kexec_load, mount—but permits hundreds of others that deterministic workloads routinely need. The EKS runtime security best practices recommend seccomp for all production workloads; for AI agents, the question is how restrictive your profile can get.
For AI agents with code generation capabilities—what ARMO CTO Ben Hirschberg identifies as the most dangerous AI capability because it means running code that no human reviewed—RuntimeDefault is too permissive. An agent that generates and executes Python code could use execve for process creation, socket and connect for network access, and openat for arbitrary file reads—all permitted by RuntimeDefault, all potentially dangerous when driven by untrusted prompts rather than reviewed code. The MITRE ATLAS framework catalogs these execution-based attack techniques specifically because AI agents turn permitted system capabilities into unpredictable attack surfaces.
You cannot write a seccomp profile for an AI agent from a deployment manifest. The manifest tells you what container image runs. It does not tell you what syscalls the agent will make, because those depend on runtime prompts.
The correct workflow on EKS: deploy the agent with RuntimeDefault seccomp, observe actual syscall behavior over a representative period using kernel-level instrumentation, generate a custom Localhost profile from observed syscalls, deploy the profile to worker nodes at /var/lib/kubelet/seccomp/ (distributed via a DaemonSet or the Security Profiles Operator), apply it to the agent pod’s security context, and run in audit mode first (SCMP_ACT_LOG for unmatched syscalls) before graduating to enforcement (SCMP_ACT_ERRNO or SCMP_ACT_KILL_PROCESS). This mirrors the observe-to-enforce workflow at the process control layer—visibility first, enforcement based on evidence.
A LangChain agent that generates Python code will invoke different syscalls depending on the prompt. Monday’s traffic might trigger execve, clone, and socket for a data analysis task. Tuesday’s traffic might add openat for file I/O and additional connect calls for a new API endpoint. The observation period needs to be long enough to capture the agent’s full behavioral range across different prompt types and tool invocations. For most agents, seven to fourteen days captures a representative sample. Agents with particularly varied tool sets—those integrating multiple MCP servers or running diverse analytical tasks—may need longer.
This is why manual syscall enumeration does not scale for AI agent workloads. The Kubernetes Agent Sandbox CRD provides code execution isolation through gVisor or Kata Containers—controlling where untrusted code runs. Seccomp profiles complement this by controlling what syscalls the agent process can make. But both are static controls. Automated behavioral profiling—observing syscalls at the kernel level across the full range of agent operation, then generating the profile from that evidence—is the only approach that produces profiles tight enough to be useful and accurate enough to avoid breaking production.
Seccomp restricts which syscalls the agent can make. It cannot distinguish why the agent is making them. An openat call to read a legitimate configuration file is the same syscall as an openat call to read /var/run/secrets/kubernetes.io/serviceaccount/token for credential theft. A connect to your authorized Bedrock endpoint is the same syscall as a connect to an attacker-controlled server—if both are in the seccomp allowlist, the profile cannot tell them apart.
Context-aware enforcement requires correlating the syscall with the agent’s application-layer behavior: what tool invocation triggered the call, what prompt preceded it, and whether the pattern matches the agent’s established baseline. ARMO generates seccomp profiles from observed agent behavior—capturing the full syscall range across representative operation periods—and deploys them in audit mode for validation before graduating to enforcement. Because the profile is derived from evidence rather than guesswork, the risk of breaking production drops dramatically compared to manually authored profiles.
Layering All Three Controls: A Progressive Deployment Sequence
Rolling out identity, network, and process controls simultaneously without breaking production requires sequencing.
Week 1: Deploy with observation-mode controls. Broad IRSA or Pod Identity role that permits the full API set your agents might need. Default-allow NetworkPolicies (or no NetworkPolicies, which is the EKS default). RuntimeDefault seccomp. All telemetry flowing: CloudTrail capturing API calls, VPC Flow Logs recording egress, and runtime sensors (deployed as a DaemonSet on managed node groups) capturing syscalls, process trees, and network connections at the kernel level. ARMO’s runtime AI-BOM discovers agents within hours—identifying frameworks, models, tool integrations, and dependencies based on actual runtime behavior rather than manifests.
Week 2: Generate baselines and draft controls. Profile API usage from CloudTrail to draft the replacement IAM policy. Map egress destinations from Flow Logs and runtime sensor data to draft the SecurityGroupPolicy and NetworkPolicy set. Capture syscall behavior to draft the custom seccomp profile. Application Profile DNA builds the per-agent behavioral baselines that inform each control.
Week 3: Deploy controls in audit mode. Apply the replacement IAM policy in a parallel role for shadow testing. Deploy NetworkPolicies in logging mode using the VPC CNI’s policy enforcement logs to CloudWatch. Apply the seccomp profile in SCMP_ACT_LOG mode. Compare expected versus actual. Tune policies where the baseline missed legitimate edge-case behavior.
Week 4: Graduate to enforcement. Swap to the least-privilege IAM role. Switch NetworkPolicies to enforcement. Graduate seccomp to blocking mode. Maintain runtime monitoring for behavioral drift as models update and prompts change—enforcement is continuous, not one-time.
ARMO automates this progression end to end. The CADR platform, built on Kubescape, generates seccomp profiles, NetworkPolicy resources, and identity constraints from baseline data, deploys them in audit mode, monitors for false positives, and graduates to enforcement—all from a single control plane. Evidence-based least privilege at 1–2.5% CPU and 1% memory overhead, with zero code changes and no sidecars. For teams evaluating how this fits into their broader AI workload security stack, the buyer’s guide provides the complete four-pillar vendor evaluation framework, and the finance-specific evaluation guide maps native tool coverage against third-party runtime platforms.
Every EKS-native sandboxing control shares the same structural limitation when applied to AI agent workloads. The table below summarizes where each one stops and what fills the gap.
| Control | What It Enforces | What It Can See | AI Agent Blind Spot | What Fills the Gap |
| IRSA / Pod Identity | Which AWS APIs the agent can call | API call authorization (pass/deny) | Cannot detect misuse of permitted APIs (normal call volume vs. exfiltration volume) | Behavioral baselines on API call patterns per agent |
| SecurityGroupPolicy | Which VPC destinations the pod can reach | Connection allowed/denied at security group level | Cannot inspect payload or attribute connections to tool invocations | Runtime correlation of network connections to application-layer behavior |
| NetworkPolicy (VPC CNI) | Pod-to-pod and pod-to-CIDR traffic at L3/L4 | Connection allowed/denied per policy rule | Cannot adapt to dynamic agent egress without continuous updating from runtime data | Auto-generated policies from observed traffic patterns |
| Seccomp (RuntimeDefault) | Which syscalls the process can make | Syscall allowed/blocked | Cannot distinguish legitimate vs. malicious use of the same syscall | Custom profiles generated from observed agent syscall behavior |
The pattern is consistent across all four controls: each one enforces static rules about what the agent is allowed to do, but none can assess whether what the agent is actually doing is normal for that specific agent. That assessment requires runtime behavioral intelligence—the layer that transforms static controls into adaptive, evidence-based enforcement. To see how this works in practice across your EKS environment, watch a demo of the ARMO platform.
Both apply simultaneously to the same pod. SecurityGroupPolicy operates at the VPC/ENI level and is best for controlling access to AWS services like Bedrock, RDS, and S3. NetworkPolicy via the VPC CNI operates at L3/L4 through eBPF and is best for pod-to-pod traffic and egress to non-AWS endpoints. For AI agents, use SecurityGroupPolicy for AWS service boundaries and NetworkPolicy for cluster-internal segmentation and non-AWS external egress.
Fargate does not support DaemonSets, which means no runtime sensors for behavioral monitoring (ARMO or GuardDuty EKS Runtime Monitoring), no custom seccomp profile distribution via DaemonSet, and no Security Groups for Pods. If your AI agents need runtime behavioral monitoring or process-level enforcement, run them on EKS managed node groups or EKS Auto Mode.
Most agents produce usable behavioral baselines within seven to fourteen days. Agents with code generation capabilities or wide tool sets may need longer to capture the full range of legitimate syscall and network behavior.
Should I use the VPC CNI’s native NetworkPolicy support or a third-party CNI like Cilium?
The VPC CNI’s native NetworkPolicy support, eBPF-based and available since version 1.14, eliminates the need for a third-party CNI for standard L3/L4 policy enforcement. Cilium adds L7 visibility and more advanced policy capabilities. For AI agent workloads, the choice depends on whether your runtime security platform already provides the application-layer context that Cilium would add—ARMO’s CADR platform, for example, provides that layer independently of the CNI.
Behavioral drift is expected—models update, prompts change, new tool integrations ship. The enforcement workflow treats policies as living documents. Drift detection flags when an agent’s behavior deviates from its baseline so you can review the new behavior, validate it as legitimate, and update the baseline and policies accordingly. This continuous enforcement loop is a core principle of the progressive enforcement methodology.
Your engineering lead is in your office Thursday morning. They want to push an AI...
A platform security engineer gets an alert at 2:14 a.m. One of the LangChain agents...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...