Blog

Home
Blog
AI Agent Sandboxing & Progressive Enforcement: The Complete Guide

AI Agent Sandboxing & Progressive Enforcement: The Complete Guide

Mar 3, 2026

Ben Hirschberg
CTO & Co-founder

Key takeaways

What is AI agent sandboxing? AI agent sandboxing constrains what autonomous AI agents can do in production—not just isolating where they run, but controlling their API access, network connections, process executions, and data access based on observed behavior. Unlike traditional container isolation, behavioral sandboxing addresses the gap between where an agent runs and what it actually does.
What is policy paralysis? Policy paralysis is the state where security teams know they need to enforce boundaries on AI agents but can’t write effective policies because they don’t understand agent behavior yet. It results in a cycle of overly restrictive policies that break production, permissive policies that leave gaps, or no policies at all.
What is progressive enforcement? Progressive enforcement is a 4-stage methodology: discovery (inventory AI workloads), observation (build behavioral baselines), selective enforcement (constrain high-risk agents first), and full least privilege (enforce boundaries on all agents based on evidence). Each stage builds confidence that the next won’t disrupt production.
Why isn’t container isolation enough for AI agents? Container isolation controls where an agent runs, but not what it does within those boundaries. An agent with legitimate database access can still exfiltrate data through allowed API calls if compromised. Behavioral enforcement at the kernel level—controlling API access, network destinations, and process execution—addresses the threats isolation alone misses.
How does eBPF enable AI agent enforcement without code changes? eBPF operates at the Linux kernel level, enforcing security policies on AI agents without modifying application code, injecting sidecars, or requiring developer cooperation. Security teams deploy, observe, and enforce independently at 1–2.5% CPU and 1% memory overhead.
What does the observe-to-enforce workflow look like in practice? Deploy in visibility-only mode, accumulate behavioral data over a defined period, generate enforcement policies from observed behavior, then progressively promote from alert-only to active blocking. This eliminates guesswork from policy creation and reduces the risk of breaking production during enforcement rollout.

Your CISO just got word that engineering is deploying AI agents into production Kubernetes clusters next quarter. Not chatbots—autonomous agents that generate and execute code, call external APIs through MCP tool runtimes, access internal databases, and make decisions without human review.

The question lands on your security team: “How are we securing these?”

And you get stuck. Because you can’t write enforcement policies for workloads you don’t understand. Traditional workloads behave predictably—a web server handles HTTP requests, a batch job runs its script and exits. AI agents don’t work that way. Their behavior is emergent, shaped by whatever prompts they receive and whatever reasoning path the model takes. Two identical deployments of the same LangChain agent can produce completely different runtime behavior depending on what users ask them to do. You can’t write a network policy for an agent whose network destinations depend on the questions it gets asked.

This is policy paralysis: you know you need to enforce boundaries on these agents, but you can’t write the boundaries because you don’t understand the behavior.

This isn’t theoretical. ARMO CTO Ben Hirschberg experienced it firsthand when his open-source AI agent, OpenClaw, started sending unauthorized WhatsApp messages to his contacts—the agent acting entirely outside its intended scope through a connected communication channel. The OWASP Top 10 for LLM Applications lists excessive agency as a critical risk for exactly this reason: LLM-based systems granted unchecked autonomy take actions nobody predicted.

Most existing guides on AI agent sandboxing focus on the developer’s problem: which container isolation technology to use, how to configure gVisor or Firecracker microVMs, how to run LLM-generated code safely. That’s important work, but it doesn’t help the security team responsible for constraining agents they didn’t build, running in clusters they need to keep secure.

This guide is for that security team. It covers a different approach—one that starts with observation rather than enforcement, builds policies from evidence rather than guesswork, and progressively tightens boundaries as your confidence grows.

What Is AI Agent Sandboxing? (And Why Traditional Isolation Isn’t Enough)

AI Agents Are Not Traditional Workloads

An AI agent running in your cluster might receive a prompt, decide it needs to query a database, generate a Python script on the fly to process the results, call an external API to verify the data, and send a formatted response—all in a single execution chain that nobody predicted in advance. Tomorrow, given a different prompt, the same agent takes an entirely different path.

This non-determinism is what makes traditional security policies inadequate. You can’t write a network policy for an agent whose destinations depend on user input. You can’t write process constraints for an agent that generates new code on every execution. Static, pre-defined rules assume predictable behavior, and AI agents are anything but.

The Spectrum of Sandboxing Approaches

When most people hear “AI agent sandboxing,” they think of code execution isolation—running untrusted code in a container, gVisor sandbox, or Firecracker microVM. The Kubernetes community has formalized this through the Agent Sandbox CRD project under SIG Apps, which provides a declarative API for managing isolated, stateful execution environments for LLM-generated code. GKE integrates it with managed gVisor, and projects like Kata Containers offer additional VM-level isolation backends.

That’s solid infrastructure for code execution isolation. But code execution isolation is only half the problem.

Consider this: you put an AI agent in the most isolated microVM in the world. The agent has legitimate database credentials because it needs them to do its job. A prompt injection attack manipulates the agent into exfiltrating data through a legitimate API call that your network policies explicitly allow. The container isolation didn’t help because the attack happened within the boundaries you permitted.

This is the gap between isolation sandboxing (controlling where an agent runs) and behavioral sandboxing (controlling what an agent does). Most existing content addresses only the first. Closing the gap requires Kubernetes-native enforcement architectures that operate at the behavioral layer—not just the infrastructure layer.

The Policy Paralysis Problem

Why Security Teams Can’t Write Policies for AI Agents

Enforcing least privilege requires knowing what “normal” looks like. With a web server, you know it listens on port 443 and connects to a backend database. You write network policies accordingly. With a cron job, you know it runs a specific script at a specific interval.

With an AI agent, you don’t have any of that. The agent’s behavior emerges from its interactions—the prompts it receives, the tools it decides to invoke, the code it generates. So security teams face a choice nobody wants to make.

Write restrictive policies based on assumptions. You guess what the agent should be allowed to do and lock it down. Within 48 hours, someone from platform engineering is in your Slack channel with a screenshot of a failing agent and a thread full of developers asking “who changed the network policy?” You loosen the restrictions. Now you’ve got gaps you can’t quantify, but at least the product team isn’t escalating to your VP.

Write permissive policies to avoid breaking things. You give the agent broad permissions because you don’t know what to restrict. Nothing breaks—until an agent gets prompt-injected and you discover it had access to production databases, external APIs, and internal services it never should have reached. Your incident report becomes a case study in why “we’ll tighten it later” never works.

Write no policies at all. You skip enforcement entirely because you can’t get it right, and you hope detection will catch problems before they become incidents. This is where most teams are today with AI agents.

The Answer: Observe Before You Enforce

The way out of policy paralysis is to stop starting with enforcement.

Instead of writing policies based on what you think an agent should do, you observe what it actually does in production. You watch its API calls, network connections, process executions, and file access over a defined period. You build a behavioral baseline grounded in evidence, not assumptions. Then you promote those observed behaviors into enforcement policies—an observe-to-enforce workflow that replaces guesswork with evidence.

As ARMO CTO Ben Hirschberg puts it: “A security practitioner can see that an agent has been running for a week, see exactly what tools and APIs it uses, and then lock it down to only those behaviors.”

That’s the core principle behind progressive enforcement—and the methodology the rest of this guide is built around.

Progressive Enforcement: A 4-Stage Maturity Model

Progressive enforcement isn’t a product feature—it’s a methodology. It’s the recognition that enforcing security on non-deterministic workloads requires building confidence incrementally rather than deploying constraints all at once.

Stage 1: No Visibility (“Flying Blind”)

This is where most organizations are today. Development teams have deployed LangChain agents, inference servers, or custom agent frameworks into production Kubernetes clusters. Security may or may not know these workloads exist.

The risk isn’t hypothetical. Developers are deploying agent frameworks and connecting MCP tool runtimes—often without filing a Jira ticket, let alone notifying security. One team connects a CrewAI agent to an internal database for a hackathon demo, forgets to tear it down, and three months later it’s still running with production credentials. These shadow AI deployments are your biggest blind spot because you can’t secure what you can’t see.

Getting out of Stage 1 starts with discovery: automatically detecting every AI workload across your clusters without relying on manual tagging or developer self-reporting. A runtime-derived AI Bill of Materials (AI-BOM) gives you the complete inventory—which models are loaded, which RAG sources are connected, which external tools and APIs each agent actually calls.

Stage 2: Observe-Only (“See Everything, Enforce Nothing”)

Once you know what AI workloads exist, you need to understand what they actually do. Deploy in visibility-only mode and start building behavioral profiles for every agent. This means recording everything: tools invoked, APIs called, network destinations reached, processes spawned, files accessed, data flows through the execution chain. This is where runtime context transforms from a security concept into an operational methodology—the raw behavioral data that makes evidence-based enforcement possible.

Over days or weeks, these observations accumulate into a behavioral baseline—a picture of what “normal” looks like for each specific agent. This is the critical stage that most sandboxing approaches skip entirely. They jump straight from “you have an agent” to “write policies for it.” But you can’t write good policies without this evidence base.

ARMO calls these behavioral baselines “Application Profile DNA.” It’s a representation of every container’s actual runtime behavior that becomes the foundation for anomaly detection and, eventually, enforcement. Instead of declaring what an agent should do in a config file, you let the agent tell you what it does through observed behavior.

Stage 3: Selective Enforcement (“Trust but Verify”)

With behavioral baselines established, you start enforcing—selectively. Begin with your highest-risk agents: external-facing, sensitive data access, or elevated privileges. Take their observed behavioral profiles and promote them into enforcement policies. If an agent has spent two weeks accessing only three specific API endpoints and two network destinations, those become its per-agent enforcement boundaries.

For agents where you have high confidence, block deviations outright. For others, alert on deviations without blocking—giving visibility into unusual behavior without risking production disruptions. This is where progressive enforcement proves its value: instead of a binary choice between “enforce everything” and “enforce nothing,” you’re making targeted, evidence-based decisions about which agents get which level of constraint.

Stage 4: Full Least Privilege (“Enforced by Evidence”)

At full maturity, every AI agent operates within a behavioral boundary defined by its observed behavior. Your customer support chatbot has different constraints than your data analysis agent—because they have different behavioral profiles, not because someone guessed at their requirements.

Deviations from baseline are blocked in real time. New agents start at Stage 2 and progress through the maturity model. The enforcement itself is Kubernetes-native, operating at the kernel level, with no code changes, no sidecars, and no developer cooperation required. Agent behavior evolves as models are updated and prompts change—which means observation and enforcement are ongoing processes, not one-time configurations.

How It Works: Kubernetes-Native Enforcement with eBPF

The progressive enforcement methodology requires enforcement technology that operates at the system level, understands Kubernetes primitives, and doesn’t require modifying the applications being secured.

Why eBPF Is the Right Foundation

eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. For AI agent enforcement, this means you can observe and control an agent’s behavior at the system call level—seeing and restricting its network connections, process executions, file access, and API calls—without touching application code. The technology has become the foundation for eBPF-based runtime security across the Kubernetes ecosystem, and it’s especially well-suited to the progressive enforcement model because the same sensor handles both observation and enforcement.

This matters for a practical reason: if enforcement requires modifying application code, injecting sidecars, or coordinating with development teams for every policy change, adoption will stall. Security teams need to deploy, observe, and enforce independently. eBPF makes that possible.

What Enforcement Actually Controls

Behavioral enforcement for AI agents operates across four dimensions, each enforced through kernel-level enforcement controls that require zero application changes:

API and tool access. Which tools can the agent invoke? If an agent’s behavioral profile shows it only ever calls three specific endpoints, enforcement restricts it to those three. A prompt injection that redirects the agent to an unauthorized API gets blocked at the kernel level.
Network destinations. Enforcement restricts outbound connections to the destinations observed during baselining. An agent that normally connects to an internal database and one external API can’t suddenly exfiltrate data to an unknown endpoint.
Process and syscall constraints. This is critical for agents with code generation capabilities—what Ben Hirschberg identifies as the most dangerous AI capability. If an LLM generates Python code and the agent has access to an interpreter, you get arbitrary code execution that no human reviewed. Syscall-level enforcement constrains what that code can do even if the agent runs it.
File and data access. Enforcement restricts filesystem access to the paths observed during baselining. An agent that only reads from a specific config directory can’t suddenly access sensitive host paths.

Zero Code Changes, Production-Safe Overhead

ARMO’s sensor operates at the kernel level within Kubernetes, enforcing policies without modifying application code, without injecting sidecars, and without requiring developer cooperation. Security teams deploy it independently. The overhead is 1–2.5% CPU and 1% memory—within the performance budget most platform teams accept for security tooling.

This low overhead is what makes the observe-to-enforce workflow operationally viable. If the enforcement technology itself were disruptive, you couldn’t observe agents in production for extended periods before making enforcement decisions. When an incident does occur, ARMO connects application-layer signals with container, Kubernetes, and cloud events into a full-stack attack story—replacing hours of manual log correlation with a single narrative showing exactly how the attack progressed.

Cloud-Specific Implementation: EKS, AKS, and GKE

The progressive enforcement methodology works the same way regardless of cloud provider. Observe, baseline, enforce. But the underlying cloud primitives differ, and the integration points matter.

AWS EKS

EKS adds AWS-specific considerations around IRSA-based identity boundaries that define what cloud resources agents can access, VPC configuration that controls network isolation, and EKS-specific security groups that interact with Kubernetes network policies. For agents calling AWS services like Bedrock or SageMaker, IAM boundary enforcement is the critical cloud-layer control that complements kernel-level behavioral enforcement.

Azure AKS

AKS environments involve Azure AD workload identity integration for service-level authentication, Azure network policies for traffic segmentation, and Azure Policy for cluster-wide governance. For agents that interact with Azure OpenAI Service, the identity layer between AKS pods and Azure AD is where cloud-layer enforcement meets Kubernetes-native behavioral enforcement.

Google GKE

GKE has the most mature AI agent infrastructure story. GKE’s Agent Sandbox CRD with managed gVisor provides a Kubernetes-native way to launch secure, isolated single-container environments, with Workload Identity for cloud IAM mapping and VPC Service Controls for perimeter boundaries. But code execution isolation and behavioral enforcement serve different purposes—the first controls where an agent runs, the second controls what it does once running.

An enforcement layer like ARMO’s works as the cross-cloud constant: same behavioral profiling, same observe-to-enforce workflow, same eBPF-based enforcement regardless of whether you’re on EKS, AKS, or GKE.

Sandboxing AI Agents in Regulated Environments

Everything above applies to any organization running AI agents in production. But if you’re in healthcare or financial services, you have additional requirements that go beyond operational security into compliance territory.

Healthcare (HIPAA)

AI agents handling protected health information need data boundary enforcement for PHI that guarantees data doesn’t flow through unauthorized endpoints. This means constraining which data sources agents can access, maintaining continuous audit trails, and proving compliance across frameworks. ARMO’s platform includes 260+ purpose-built Kubernetes compliance controls across HIPAA, SOC2, PCI-DSS, and GDPR with continuous automated monitoring and audit-ready evidence exports.

Financial Services

Financial services teams face blast radius containment requirements in regulated financial environments that are both regulatory (SOX, PCI-DSS) and operational. AI agents that execute transactions or access customer financial data need enforcement boundaries that are provably tight, auditable, and tamper-resistant. Per-agent enforcement is especially critical: a fraud detection agent that reads transaction data needs entirely different constraints than a customer service agent calling external APIs.

Getting Started: Your First 30 Days

Progressive enforcement isn’t a six-month infrastructure project. Here’s a practical 30-day framework.

Days 1–7: Inventory. Find every AI workload across your clusters. Which agents are running? What frameworks? What tools and APIs do they access? If you’re relying on developers to self-report, your shadow AI problem is guaranteed to be bigger than you think. Automated AI workload discovery through a runtime-derived AI-BOM gives you the complete picture—including that CrewAI agent from the hackathon demo that’s still running with production credentials.

Days 8–14: Observe. Deploy in visibility-only mode. Build behavioral profiles for every AI agent. Don’t enforce anything yet. Focus on your highest-risk agents first: external-facing, data-access, or privileged workloads. By the end of week two, you should have a clear picture of what “normal” looks like for each agent.

Days 15–21: Selective enforcement. Promote your highest-risk agents’ observed behaviors into enforcement policies. Start with alert-only for most agents, active blocking for the few where you have highest confidence. Monitor for false positives and refine.

Days 22–30: Expand and operationalize. Extend enforcement to remaining agents. Establish per-agent policies. Set up continuous monitoring for behavioral drift. Document your enforcement posture for compliance and audit.

ARMO’s platform supports this exact workflow: AI-BOM handles discovery, Application Profile DNA builds behavioral baselines, and eBPF-powered observation and enforcement manages the transition—all at 1–2.5% CPU overhead with zero code changes.

The Future of AI Agent Security Is Progressive, Not Reactive

The approach most organizations take today—if they take one at all—is reactive. Wait for something to go wrong, then figure out how to prevent it. Or front-load restrictive policies that break production, then spend weeks unwinding them.

Progressive enforcement accepts that you don’t know everything about AI agent behavior upfront—and that’s okay. It gives you a structured path from no visibility to full least privilege, with each stage building confidence that the next won’t disrupt operations.

As AI agents become more autonomous and more deeply integrated into enterprise infrastructure, the organizations that invest in progressive enforcement now will have a structural advantage. They’ll deploy agents confidently because the security framework supports iteration rather than demanding perfection upfront.

The teams that wait until an incident forces the conversation will be writing policies under pressure, without behavioral baselines, and with production already at risk.

See how ARMO takes you from visibility to enforcement in days, not months.

Frequently Asked Questions

Can I sandbox AI agents without changing application code?

Yes. eBPF-based enforcement operates at the Linux kernel level, observing and restricting agent behavior through system calls. Security teams deploy, configure, and enforce policies independently—no sidecars, no code modifications, no developer coordination required.

How long does behavioral baselining take before I can start enforcing?

Most teams see usable behavioral profiles within 7–14 days of observation. The timeline depends on how varied your agents’ behavior is—a customer support chatbot with predictable patterns baselines faster than a data analysis agent that runs different queries daily. Start enforcing selectively on your most stable, highest-risk agents first while others continue baselining.

What happens if an agent’s behavior changes after enforcement policies are set?

Behavioral drift is expected as models update and prompts change. The observe-to-enforce workflow treats enforcement as continuous, not one-time. When an agent’s behavior deviates from its profile, you choose whether to block (high-risk) or alert (lower-risk) based on the deviation type, then update the baseline once the new behavior is validated as legitimate.

How is this different from Kubernetes NetworkPolicies or OPA/Gatekeeper?

NetworkPolicies control traffic between pods; OPA/Gatekeeper enforces admission-time rules on resource configurations. Neither observes runtime behavior or adapts to non-deterministic workloads. Progressive enforcement combines runtime context with kernel-level controls to restrict API access, process execution, file access, and network destinations based on what an agent actually does—not what you predict it will do.

Does progressive enforcement work across multi-cloud environments?

The methodology is cloud-agnostic: observe, baseline, enforce. The eBPF-based enforcement layer works identically on EKS, AKS, and GKE. Cloud-specific primitives like IRSA (AWS), Azure AD workload identity, and GKE Workload Identity handle the IAM integration layer, while ARMO provides the cross-cloud behavioral enforcement constant.

Can I set different enforcement levels for different agents?

Yes—that’s a core principle. A fraud detection agent reading transaction data needs tighter constraints than an internal summarization bot. Per-agent enforcement boundaries are derived from each agent’s individual behavioral profile, so policies reflect actual risk rather than one-size-fits-all rules. High-risk agents get active blocking; lower-risk agents can run in alert-only mode.

How does the Kubernetes Agent Sandbox CRD relate to progressive enforcement?

The Agent Sandbox CRD provides code execution isolation—running untrusted LLM-generated code in gVisor or Kata Container sandboxes. Progressive enforcement provides behavioral control over what agents do within those sandboxes. They’re complementary: the CRD handles the infrastructure isolation layer, and kernel-level behavioral enforcement handles the runtime policy layer.

What’s the biggest risk of delaying AI agent enforcement?

Shadow AI. Developers are deploying agent frameworks and connecting MCP tool runtimes without security review. Every week without visibility means more unmonitored agents accumulating in production with credentials, API access, and network permissions nobody audited.

The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live

Your engineering lead is in your office Thursday morning. They want to push an AI...

Shauli Rozen

CEO & Co-founder

Apr 10, 2026

AI Workload Baseline and Drift Detection: Defining “Normal” Agent Behavior

Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...

Ben Hirschberg

CTO & Co-founder

Apr 10, 2026

CVE-2026-0968: The libssh Heap Read That Isn’t as Scary as Scanners Say

A missing null check in libssh’s SFTP directory listing code lets a malicious server crash...