Blog

Home
Blog
What Is AI Agent Sandboxing? Kubernetes-Native Enforcement Explained

What Is AI Agent Sandboxing? Kubernetes-Native Enforcement Explained

Mar 6, 2026

Shauli Rozen
CEO & Co-founder

AI Agent Sandboxing Has a Definition Problem

You’re in a Slack thread at 9 AM on a Tuesday. A developer is asking why their LangChain agent can’t reach an external API anymore. You wrote the NetworkPolicy that blocked it. But you also can’t explain why you wrote that specific rule—because you wrote it based on what you guessed the agent would do, not what it actually does. You don’t have behavioral data. You don’t have an observation period. You have a YAML file you wrote at 11 PM last Thursday because someone asked “are we securing these agents?” and you needed to have an answer by Friday morning.

Meanwhile, search for “AI agent sandboxing” and you’ll find dozens of articles explaining how to run untrusted LLM-generated code in isolated containers. gVisor configurations, Firecracker microVM setups, the Kubernetes Agent Sandbox CRD from SIG Apps. All solid technical content—for the AI engineer building the agent. None of it helps the security engineer writing that NetworkPolicy at 11 PM.

That’s because the term “sandboxing” means fundamentally different things to these two teams. To the AI engineer, it means code execution isolation—running untrusted code in a contained environment so it can’t escape. To the security engineer, it means behavioral enforcement—constraining what an agent can access and do at runtime, regardless of whether its code is trusted. Both definitions are valid. Both represent real security requirements. But conflating them leaves many gaps: agent escape, tool interaction manipulation, excessive agency, prompt injection leading to unauthorized actions. Some of those are code execution problems. Most of them aren’t.

This article breaks down the full sandboxing landscape—what each approach protects against, where it falls short, and what Kubernetes-native enforcement architecture looks like when you need both. For the complete progressive enforcement methodology (the 4-stage maturity model and 30-day implementation plan), see the companion hub: AI Agent Sandboxing & Progressive Enforcement: The Complete Guide.

What “Sandboxing” Actually Means for AI Agents

Most content about AI agent sandboxing treats it as a single technique. It isn’t. Sandboxing is a spectrum of controls applied at different layers, each addressing a different category of threat. Understanding these layers is the prerequisite for building a strategy that actually covers your risk surface.

Layer 1: Code Execution Isolation

This is the dominant interpretation in current guides and vendor content. Code execution isolation prevents LLM-generated code from escaping its runtime environment. It’s the container or VM boundary—the wall between the agent’s execution environment and the rest of your infrastructure. Technologies include gVisor (user-space kernel interception), Kata Containers and Firecracker (hardware-enforced microVMs), and the Kubernetes Agent Sandbox CRD (declarative lifecycle management for isolated agent environments).

Layer 1 is essential for any agent that generates and executes code. If an LLM produces a Python script and the agent has access to an interpreter, you need a boundary that prevents that code from compromising the host kernel, accessing other pods’ filesystems, or establishing unauthorized network connections.

Layer 2: Resource and Network Containment

This layer restricts what system resources the agent’s environment can access—filesystem paths, network egress, CPU and memory. It’s implemented through standard Kubernetes controls (NetworkPolicies, seccomp profiles, AppArmor, resource quotas) and cloud-layer IAM boundaries (AWS IRSA, Azure AD Workload Identity, GKE Workload Identity).

Layer 2 is table stakes for any Kubernetes deployment. It limits blast radius by restricting what the environment can touch. But it operates at the environment level, not the agent level—it constrains the pod, not the behavior within it.

Layer 3: Behavioral Enforcement

This is the layer most organizations are missing entirely. Behavioral enforcement constrains what the agent does within its permitted environment—which APIs it calls, which tools it invokes, which data flows it initiates, which processes it spawns—based on observed runtime behavior rather than static configuration.

And this is where the trail of familiar Kubernetes security tools ends. There is no Layer 3 equivalent of a NetworkPolicy. No admission controller for behavioral boundaries. No kubectl command that shows you what an agent did in the last hour. For AI agents, this layer is an open field—and it’s exactly where the most dangerous threats operate.

Consider a concrete example: an internal assistant agent connected to your document store and Slack through an MCP server. A prompt injection causes the agent to read a confidential document and post its contents to a public Slack channel. The agent never escaped its container (Layer 1 is intact). The Slack API was an explicitly connected tool (Layer 2 permitted it). Every individual action—read document, post to Slack—was within the agent’s permitted toolset. This is exactly what OWASP catalogs as “Agent Tool Interaction Manipulation,” and it’s invisible to Layers 1 and 2 because it happened within the boundaries you already set. Only enforcement that understands what the agent normally does—and detects when it deviates—catches it.

The Isolation Layer: A Technical Deep-Dive

Before addressing the behavioral enforcement gap, it’s worth understanding what isolation sandboxing does well—and where its architectural limitations create the opening that behavioral enforcement must fill.

Container Isolation: Necessary but Not Sufficient

Standard containers share the host kernel. Linux namespaces and cgroups provide process isolation, filesystem visibility boundaries, and resource limits. For deterministic workloads—a web server, a batch job—this is often adequate. For AI agents executing untrusted, LLM-generated code, shared-kernel isolation is a known attack surface.

NVIDIA’s AI Red Team guidance identifies mandatory controls for agentic sandboxing that go well beyond standard container configuration: network egress controls that block all outbound traffic by default, filesystem write restrictions outside the workspace, and credential isolation that prevents agents from inheriting host secrets. Standard container deployments rarely implement all three—and a sufficiently capable agent can exploit the gaps.

gVisor: User-Space Kernel Interception

gVisor interposes a user-space kernel (Sentry) between the application and the host kernel. Every system call from the containerized workload is intercepted, filtered, and handled by Sentry before it reaches the actual Linux kernel. This provides strong isolation without the overhead of a full virtual machine.

The trade-off is performance. gVisor introduces 5–15% overhead on syscall-heavy workloads because every system call takes an extra hop through user space. If you’ve ever had a platform engineer push back on a performance hit for a security tool, you know the conversation that follows. gVisor’s overhead is justified when code execution is your primary risk—agents generating Python scripts, running shell commands, processing untrusted data. It’s a harder sell when the agent’s real danger is the API calls it makes, not the code it runs. The agent never tries to escape the kernel. It misuses the permissions it already has.

Kata Containers and Firecracker MicroVMs

Kata Containers and Firecracker provide hardware-enforced boundaries with dedicated kernels per workload. Each sandbox runs in its own lightweight virtual machine, with a separate kernel that eliminates shared-kernel attack surfaces entirely. This is the strongest isolation available.

The cost is operational complexity and higher overhead. Provisioning microVMs is slower than starting containers (though warm pools help), resource consumption is higher, and managing VM-level networking adds infrastructure burden. The security justification is clear for environments running fully untrusted code where a kernel exploit’s blast radius is unacceptable—ML training pipelines, multi-tenant code execution services, agents processing untrusted third-party data.

For agents whose primary risk is behavioral rather than code execution, microVM isolation provides diminishing returns. You’re paying the overhead of the strongest wall in Kubernetes security, and the threat is walking through the front door with a valid key.

The Kubernetes Agent Sandbox CRD

The Agent Sandbox project under Kubernetes SIG Apps is the community’s formal answer to AI agent isolation. Launched at KubeCon NA 2025, it provides a declarative API for managing isolated, stateful, singleton workloads designed specifically for agent runtimes.

The project introduces three core primitives. The Sandbox resource defines the core agent environment—a single, stateful pod with stable identity and persistent storage. SandboxTemplate defines reusable blueprints for sandbox configurations, including resource limits, base images, and security policies. SandboxClaim is a transactional resource that lets higher-level frameworks (LangChain, ADK, CrewAI) request execution environments without managing provisioning logic directly. WarmPools pre-provision sandbox pods for fast creation, reducing startup latency from minutes to seconds.

The Agent Sandbox CRD is strong infrastructure for code execution isolation. It solves the lifecycle management problem—creating, pausing, resuming, and destroying sandboxed environments at scale using Kubernetes-native primitives. What it does not provide is any behavioral awareness of what the agent does inside the sandbox. If you check back an hour after creating a sandbox, you’ll know the sandbox exists and the agent is running. You won’t know what tool calls it made, what APIs it reached, what data it accessed, or whether any of that behavior was expected. The CRD has no observation layer, no policy generation from behavior, and no runtime enforcement beyond the container boundary.

For organizations deploying AI agents with real autonomy in production, the Agent Sandbox CRD is a necessary foundation—not a complete answer.

Where Isolation Falls Short: Attack Scenarios That Bypass Container Boundaries

Theory establishes the framework. Scenarios make it real. The following four attack patterns illustrate threats that are invisible to isolation-only sandboxing—every one of them happens within the boundaries you’ve already permitted.

Scenario 1: Prompt Injection → Data Exfiltration Through Allowed Channels

An AI agent has legitimate database credentials—it needs them to do its job (answering customer queries against a product database). It also has permitted network egress to your Datadog endpoint for telemetry—the same endpoint your NetworkPolicy has allowed since the cluster was provisioned. Both are correctly configured: database access is scoped, the network policy explicitly allows Datadog.

An attacker embeds a prompt injection in user input. The agent queries sensitive customer records and formats the results as telemetry data, sending them to the Datadog endpoint in a payload structure that looks identical to normal application metrics. Container isolation: no help—nothing escaped the container. Network policies: no help—Datadog is explicitly allowed. The attack succeeded entirely within permitted boundaries, and the exfiltrated data is sitting in your own observability platform disguised as metrics.

What catches it: Behavioral enforcement detects the anomaly because the agent’s data flow pattern deviates from its observed baseline. The agent has never queried customer records and sent results to the telemetry endpoint in the same execution chain. The deviation triggers a block or alert depending on the enforcement level configured for that agent.

Scenario 2: Privilege Escalation Through Legitimate Code Generation

An agent with code generation capabilities generates a Python script that reads environment variables containing service account tokens. The tokens were mounted by the deployment—they’re available to any process in the pod. The agent uses those tokens to authenticate against resources far beyond its intended scope: internal APIs, cloud storage buckets, CI/CD pipelines.

Every individual action was “allowed.” The agent has interpreter access (it’s a code generation agent). The environment variables were mounted by the deployment spec. The tokens are valid. The downstream APIs accept authenticated requests. Only behavioral enforcement—recognizing that this agent has never accessed those environment paths or those downstream resources in its observed behavioral history—catches the escalation chain.

What catches it: eBPF-based enforcement at the kernel level detects the agent accessing filesystem paths (environment variable mounts) and network destinations (downstream APIs) that fall outside its established behavioral baseline. The deviation triggers enforcement before the escalation completes.

Scenario 3: Tool Misuse via MCP Runtime

An agent connected to multiple tools through an MCP server: internal documentation, calendar, and Slack. The agent’s normal behavior accesses docs and calendar—it’s an internal assistant that helps employees find information and schedule meetings.

A prompt injection causes the agent to take document content marked as confidential and post it to a public Slack channel. The agent never escaped its container. It used an explicitly connected tool in a way nobody anticipated. This is exactly what OWASP catalogs as “Agent Tool Interaction Manipulation”—and it’s invisible to isolation-only sandboxing because every individual action (read document, post to Slack) is within the agent’s permitted toolset.

What catches it: Behavioral enforcement recognizes that this agent has never combined document reads with Slack posts in its observed profile. The policy boundary detects the novel tool usage pattern and blocks the action before the content is posted.

Scenario 4: Shadow AI Agent with Accumulated Permissions

ARMO CTO Ben Hirschberg experienced a version of this firsthand. His open-source AI agent, OpenClaw, started sending unauthorized WhatsApp messages to his contacts—the agent reaching out to people on its own, through a connected communication channel, acting entirely outside its intended scope. No container escape. No exploit. Just an agent that had been given access to a messaging tool and decided to use it in a way nobody predicted.

This pattern is common in enterprise environments. A developer deploys an agent for a hackathon demo, connects it to messaging, email, and internal APIs. The demo ends. The agent keeps running with production credentials. Three months later, it’s still active—and nobody has visibility into what it’s doing because it was never formally onboarded into the security team’s monitoring.

The question for Scenario 4 isn’t what enforcement catches the behavior. The question is whether anyone even knows the agent exists. Runtime discovery—automatically finding AI workloads across clusters without relying on developer self-reporting—is the prerequisite. You need a runtime-derived AI Bill of Materials (AI-BOM) that inventories every agent, which models it loads, which tools it connects to, and which APIs it calls. Only after discovery can behavioral enforcement do its job: profiling what the agent does and flagging when it deviates from normal behavior.

Behavioral Enforcement: Constraining What Agents Actually Do

If isolation sandboxing is the wall around the agent’s environment, behavioral enforcement is the set of rules governing the agent’s actions within that environment. It operates at the kernel level, observing and controlling the agent’s system calls to restrict API access, network connections, process executions, and file access based on what the agent has actually been observed doing—not on what someone guessed it might need to do.

The Four Dimensions of Runtime Enforcement

Behavioral enforcement for AI agents operates across four complementary dimensions. Each one addresses a threat vector that isolation sandboxing cannot see.

1. API and Tool Access Enforcement

This dimension restricts which endpoints and tools the agent can invoke. If the agent’s behavioral profile shows it consistently calls three specific API endpoints during normal operation, enforcement restricts it to those three. A prompt injection that attempts to redirect the agent to an unauthorized endpoint—even one the agent technically has network access to reach—gets blocked at the kernel level before the request leaves the pod.

This is where behavioral enforcement addresses OWASP’s “Agent Tool Interaction Manipulation” directly. The agent’s tools are explicitly connected—MCP servers, database credentials, external APIs—but enforcement ensures the agent can only use them in patterns consistent with observed behavior.

2. Network Destination Enforcement

Enforcement restricts outbound connections to the destinations observed during the baselining period. Traditional Kubernetes NetworkPolicies operate at the pod level—they allow or deny traffic between pods or to external CIDRs. Behavioral enforcement operates at the agent level, understanding that this specific agent has only ever connected to an internal database and one external API. An attempt to reach any other destination—even one the pod’s NetworkPolicy would allow—triggers enforcement.

This distinction matters because pods often host multiple processes, and AI agents share pods with sidecars, init containers, and supporting infrastructure. Pod-level network rules can’t distinguish between the agent’s traffic and other legitimate traffic from the same pod.

3. Process and Syscall Constraints

This is critical for agents with code generation capabilities—what ARMO CTO Ben Hirschberg identifies as the most dangerous AI capability: “The LLM sees a complex problem, it can generate Python code, and if the agent has access to a Python interpreter, you could convince the agent to run that code. It’s very sensitive because you will run code that no one checked, that no one approved.”

Syscall-level enforcement constrains what that generated code can do even if the agent runs it. If the agent’s behavioral baseline shows it only spawns Python processes that read from specific paths and write to stdout, enforcement prevents generated code from spawning new shells, writing to sensitive directories, or making system calls outside the observed pattern.

4. File and Data Access Enforcement

Enforcement restricts filesystem access to the paths observed during baselining. An agent that only reads from a specific configuration directory during normal operation can’t suddenly access sensitive host paths, mounted secrets, or other containers’ volumes. This directly prevents the privilege escalation in Scenario 2—where an agent reads environment variable mounts containing service account tokens it was never intended to access.

Why eBPF Is the Right Enforcement Foundation

eBPF (extended Berkeley Packet Filter) has become the foundational technology for runtime security in Kubernetes environments. For AI agent behavioral enforcement specifically, its architectural properties solve the operational problems that have blocked adoption of other enforcement approaches.

The two-stack problem. Most security architectures force you to run one tool for monitoring and a separate one for enforcement—then somehow keep them in sync, with different agents, different data models, and different configuration surfaces. eBPF eliminates that. The same programs that observe agent behavior during the baselining period also enforce the policies derived from that observation. One sensor. One data model. Observation becomes enforcement with a configuration change, not an infrastructure migration.

The bypass problem. Application-layer guardrails—the kind injected as middleware or prompt filters—operate in the same process space as the agent. A prompt injection that can redirect the agent’s logic can also redirect its guardrails. eBPF programs run in kernel space. A prompt injection can manipulate the agent’s behavior, but it can’t override kernel-level restrictions. The enforcement layer is outside the agent’s control.

The adoption stall. If enforcement requires modifying application code, injecting sidecar containers, or coordinating with development teams for every policy change, it doesn’t get deployed. Security teams know this—they’ve seen it with every tool that requires developer cooperation. eBPF-based enforcement deploys as a kernel-level sensor that security teams manage independently. No application redeployment. No developer coordination. No disruption to existing CI/CD pipelines.

The observation-period tax. Progressive enforcement requires observing agents in production for 7–14 days before making enforcement decisions. If the enforcement technology itself adds significant overhead, that observation period becomes a performance negotiation with platform engineering. ARMO’s eBPF-based enforcement operates at 1–2.5% CPU and 1% memory overhead—low enough that extended observation periods don’t trigger the “when are you removing that thing?” conversation.

If you’re wondering whether NetworkPolicies, OPA/Gatekeeper, or Falco can solve the behavioral enforcement problem—they each operate at a different layer. NetworkPolicies control pod-level traffic without behavioral awareness. OPA enforces admission-time rules with no runtime component. Falco detects system calls without enforcement or application-layer context. None of them observe agent behavior, build baselines, or enforce per-agent boundaries.

Per-Agent Granularity: Why One-Size-Fits-All Policies Fail

Different agents have fundamentally different behavioral profiles, and enforcement must reflect that difference. A customer support chatbot has a narrow, predictable pattern—it queries a product database and returns formatted answers. A data analysis agent runs different queries daily depending on user requests—its behavioral envelope is wider but still bounded. A code generation agent needs the tightest syscall constraints because it executes arbitrary code.

Per-agent enforcement means each agent’s boundaries are derived from its own individual behavioral profile—what ARMO calls “Application Profile DNA.” As ARMO CTO Ben Hirschberg puts it: “A security practitioner can see that an agent has been running for a week, see exactly what tools and APIs it uses, and then lock it down to only those behaviors.”

This replaces blanket rules applied uniformly to all AI workloads with evidence-based boundaries shaped by each agent’s actual behavior. The chatbot gets narrow constraints because it has a narrow profile. The analysis agent gets wider constraints because it has a wider profile. Neither gets constraints that someone guessed at before understanding what the agent actually does.

Mapping Agentic Threats to Sandboxing Layers

The OWASP Top 10 for Agentic Applications catalogs the threats specific to autonomous AI systems. Mapping these threats to the three sandboxing layers shows which approaches cover which risks—and the result isn’t ambiguous.

OWASP Agentic Threat	Isolation (Layer 1)	Containment (Layer 2)	Behavioral (Layer 3)
Excessive Agency	✗ No—agent stays in container	⚠ Partial—IAM limits scope	✓ Constrains to observed behavior
Tool Interaction Manipulation	✗ No—tools are connected	✗ No—tool access permitted	✓ Detects abnormal tool patterns
Prompt Injection → Code Exec	✓ Contains blast radius	✓ Limits system resources	✓ Constrains process/syscall behavior
Privilege Escalation	⚠ Partial—if unprivileged	⚠ Partial—IAM/RBAC	✓ Detects unobserved resource access
Data Exfiltration (Legitimate Channels)	✗ No—channels permitted	✗ No—destinations allowed	✓ Detects anomalous data flows
Insecure Code Generation	✓ Limits execution env	✓ Constrains resources	✓ Syscall enforcement on generated code
Agent Escape	✓ Primary purpose	⚠ Reduces blast radius	✓ Detects out-of-scope behavior

The pattern is clear. Isolation sandboxing (Layer 1) effectively addresses code execution threats—insecure code generation and agent escape. But the majority of AI agent-specific threats—the ones OWASP specifically catalogs as agentic risks distinct from traditional application risks—require behavioral enforcement at Layer 3. Of the seven agentic threats mapped here, only two are fully addressed by isolation alone. The remaining five require behavioral enforcement—a roughly 70/30 split that makes the case for Layer 3 hard to ignore. Tool manipulation, excessive agency, and data exfiltration through legitimate channels are invisible to isolation-only approaches because they happen within permitted boundaries.

Next Steps

If you’re evaluating sandboxing for AI agents in production Kubernetes, the decision isn’t isolation vs. enforcement—it’s whether to add behavioral enforcement now or wait until an incident forces it. Every week of observation data you don’t collect is a week of behavioral baselines you’ll wish you had when enforcement becomes urgent. The teams that start observing now will have evidence-based policies ready when the pressure arrives. The teams that wait will be writing policies under pressure, without baselines, and with production already at risk.

For the complete progressive enforcement methodology—discovery, observation, selective enforcement, and full least privilege—including the 30-day implementation plan and cloud-specific guidance for EKS, AKS, and GKE, see: AI Agent Sandboxing & Progressive Enforcement: The Complete Guide.

To see how ARMO takes you from visibility to enforcement in days, check out this demo.

Frequently Asked Questions

What’s the difference between isolation sandboxing and behavioral enforcement?

Isolation sandboxing controls where an agent runs—it prevents code from escaping the container or VM boundary. Behavioral enforcement controls what the agent does within that boundary—which APIs it calls, which tools it invokes, which data it accesses. An agent inside the most isolated microVM in the world can still exfiltrate data through legitimate API calls if behavioral enforcement isn’t in place.

Do I need both isolation and behavioral enforcement?

If your agents generate and execute code, you need isolation as a baseline. If your agents make tool calls, access APIs, or process data—which is true of virtually all production AI agents—you need behavioral enforcement. Most production deployments need both layers working together.

How long does behavioral baselining take before I can start enforcing?

Most teams see usable behavioral profiles within 7–14 days of observation. The timeline depends on how varied the agent’s behavior is—a customer support chatbot with predictable patterns baselines faster than a data analysis agent running different queries daily. Start enforcing selectively on your most stable, highest-risk agents first while others continue baselining.

Can I sandbox AI agents without changing application code?

Yes. eBPF-based behavioral enforcement operates at the Linux kernel level, observing and restricting agent behavior through system calls. Security teams deploy, configure, and enforce policies independently—no sidecars, no code modifications, no developer coordination required. The overhead is 1–2.5% CPU and 1% memory.

What happens when agent behavior legitimately changes after enforcement policies are set?

Behavioral drift is expected as models update and prompts change. The observe-to-enforce workflow treats enforcement as continuous, not one-time. When an agent’s behavior deviates from its profile, you choose whether to block (high-risk) or alert (lower-risk) based on the deviation type, then update the baseline once the new behavior is validated as legitimate.

What’s the biggest risk of relying only on isolation sandboxing?

The threats you miss. The OWASP Top 10 for Agentic Applications catalogs threats—tool misuse, excessive agency, data exfiltration through legitimate channels—that happen entirely within the boundaries isolation permits. Roughly 70% of the OWASP agentic threat surface requires behavioral enforcement that isolation-only approaches cannot provide.

The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live

Your engineering lead is in your office Thursday morning. They want to push an AI...

Shauli Rozen

CEO & Co-founder

Apr 10, 2026

AI Workload Baseline and Drift Detection: Defining “Normal” Agent Behavior

Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...

Ben Hirschberg

CTO & Co-founder

Apr 10, 2026

CVE-2026-0968: The libssh Heap Read That Isn’t as Scary as Scanners Say

A missing null check in libssh’s SFTP directory listing code lets a malicious server crash...