The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Mar 5, 2026
You’ve deployed five AI agents into your production Kubernetes cluster: a customer support chatbot, a fraud detection agent, a data pipeline processor, a code generation assistant, and an internal summarization bot. Your security team writes one set of guardrails and applies them uniformly.
Within a week, you discover the code generation agent needs interpreter access the chatbot should never have. The fraud detection agent requires database permissions that would be a critical vulnerability if the summarization bot got them. The data pipeline processor connects to external APIs that the chatbot has no business reaching.
So you’re stuck. Loosen the policies and every agent inherits the highest-risk agent’s permissions—a textbook case of “excessive agency,” where the blast radius of a prompt injection on your chatbot now extends to every capability your most privileged agent has. Tighten the policies and half your agents break. Write bespoke policies for each one and you’re spending the next quarter hand-crafting YAML for workloads you don’t fully understand.
This is the per-agent guardrails problem, and it’s why most organizations either over-permit their AI agents or don’t enforce anything at all. This guide covers how to get per-agent enforcement without hand-crafting a single policy—and why the approach most teams try first makes the problem worse.
Nobody sets out to build a one-size-fits-all policy for their agent fleet. It happens gradually. Your team deploys their first AI agent—a customer support chatbot. It’s straightforward: three internal APIs, two services, no code generation, read-only data access. Your security team writes a reasonable policy. Network rules, RBAC constraints, maybe some OPA checks. It works.
Two months later, engineering deploys three more agents. Nobody tells security, or if they do, it’s a Slack message the week after the agents are already in production. The path of least resistance is obvious: apply the chatbot’s policy to the new agents and adjust later. Except “later” never comes, because the team is already triaging the next deployment.
Then agent number five arrives—a code generation assistant that needs interpreter access, external API calls, and write permissions to a staging directory. The chatbot’s policy breaks it immediately. Someone from platform engineering shows up in your Slack channel with a screenshot of a failing workflow and a thread of developers asking who changed the network policy. You loosen the restrictions. Now every agent has the code generation agent’s permissions, including the chatbot that should never spawn a process.
This is how blanket policies form: not by design, but by accumulation. Once in place, they’re almost impossible to unwind because nobody documented which permissions each agent actually needs. Per-agent guardrails break this cycle—but they require an observe-first methodology that progressively reduces blast radius rather than trying to define boundaries upfront.
If your team is still in the stage where enforcement is needed but nobody knows where to start, the complete guide to AI agent sandboxing and progressive enforcement covers the full four-stage methodology from zero visibility to full least privilege.
Per-agent guardrails aren’t a YAML file someone wrote by guessing what an agent should be allowed to do. They’re behavioral boundaries derived from what each agent actually does in production—and that distinction separates them from every other approach to AI agent security.
Behavioral enforcement operates across four dimensions, each enforced at the kernel level through eBPF-based controls that require zero application changes.
API and tool access controls which tools and endpoints each agent can invoke. If a prompt injection redirects an agent to an unauthorized API, kernel-level enforcement blocks the call—because it was never part of that agent’s observed behavior.
Network destinations restrict each agent to the connections observed during baselining. An agent that normally connects to an internal database and one external API can’t suddenly exfiltrate data to an unknown endpoint.
Process and syscall constraints are critical for agents with code generation capabilities. This is the most dangerous AI capability. When an LLM generates Python code and the agent has access to an interpreter, you get arbitrary code execution that no human reviewed. Syscall-level enforcement constrains what that code can do even if the agent runs it—and detecting when agents break their execution boundaries becomes the critical backstop when enforcement alone isn’t enough.
File and data access restricts each agent to the filesystem paths observed during baselining. In regulated environments, cross-agent data access isn’t just a security gap—it’s a compliance violation. Per-agent file restrictions prevent this by design.
Most teams reaching for AI agent guardrails today encounter three categories of tooling, none of which solve the per-agent problem at the infrastructure level.
Prompt-level guardrails (LangChain middleware, OpenAI’s guardrails SDK, NeMo Guardrails) validate model inputs and outputs. They filter what the model says, but don’t control what the agent does after it decides to act. A prompt guardrail doesn’t stop an agent from connecting to an unauthorized network destination. And they require developer implementation per agent—security teams can’t deploy or modify them independently.
Governance frameworks (OPA/Gatekeeper, admission controllers) enforce rules at deployment time—validating configurations before workloads launch. They can’t adapt to non-deterministic behavior because they enforce static rules that don’t observe runtime activity.
Infrastructure isolation (GKE Agent Sandbox, gVisor, Kata Containers) controls where agents run but not what they do within permitted boundaries. An agent with legitimate database credentials can still exfiltrate data through an API call your network policies explicitly allow.
Per-agent guardrails operate at a layer below all three: behavioral enforcement at the kernel level, observing what each agent actually does and enforcing boundaries specific to that agent’s profile. eBPF-based runtime enforcement fills this gap because the same sensor handles both observation and enforcement, operating independently of developer tooling with zero code changes required.
Even with per-agent behavioral profiles, you need a framework for deciding which agents get active blocking versus alert-only monitoring. Applying maximum enforcement to every agent creates the same operational friction you’re trying to avoid.
Security teams implementing per-agent enforcement typically assess four risk dimensions before deciding enforcement levels. This aligns with the direction federal bodies are taking—NIST’s Center for AI Standards recently issued an RFI on security considerations for AI agent systems that specifically calls out least privilege and identity management as priorities. Teams building continuous risk profiling across their AI workload fleet will recognize these dimensions as the agent-specific layer of broader posture management.
Data sensitivity. Does the agent access PII, financial records, health data, or production databases? A fraud detection agent reading transaction records operates in a fundamentally different risk tier than a bot summarizing internal meeting notes.
External connectivity. Does the agent call external APIs, reach third-party services, or connect outside the cluster? Any outbound connection is a potential exfiltration vector, and agents with external connectivity need tighter behavioral boundaries.
Code execution capability. Can the agent generate and run code? This is the risk dimension most teams underestimate. A code generation agent with interpreter access can execute arbitrary logic that nobody reviewed—making it the highest-risk category for behavioral enforcement.
Privilege level. Does the agent run with elevated permissions, access secrets, or have write access to critical systems? Agents with production credentials that were granted “because it needed them for the demo” three months ago are your biggest exposure.
| Risk Classification | Profile | Enforcement Level | Example Agent |
| High risk | High data sensitivity + external connectivity or code execution | Active blocking after baselining | Fraud detection agent, code generation assistant |
| Medium risk | Moderate data access, internal-only connectivity | Alert-only with escalation on anomalies | Internal analytics agent, data pipeline processor |
| Lower risk | Read-only, internal, no code execution | Behavioral monitoring with periodic baseline validation | Internal summarization bot, document search agent |
The key principle: the enforcement level itself is per-agent, not just the policy rules. Some agents get active blocking from day one. Others run in alert-only mode while you build confidence. This granularity—different enforcement intensity for different agents in the same cluster—is what most existing tools can’t deliver.
The biggest objection security teams raise about per-agent enforcement is the work involved. Thirty agents means thirty policy sets—who’s writing those? Nobody. Per-agent guardrails don’t start with policy writing. They start with observation. And the observation itself creates the differentiation automatically.
Agent behavior is emergent—shaped by prompts, reasoning paths, and tool invocations. Two identical deployments of the same framework can produce completely different runtime behavior. This is the policy paralysis problem at the per-agent level: you can’t pre-define 30 policy sets for 30 agents when you don’t know what they do. Manual definitions lead to the same cycle—overly restrictive rules that break production, permissive rules that leave gaps, or no rules at all.
The workflow is straightforward. Deploy in visibility-only mode across all agents. eBPF sensors observe each agent’s behavior independently—API calls, network connections, process executions, file access. Over 7–14 days, each agent accumulates its own behavioral profile, which becomes the foundation for its enforcement boundary. Then you promote profiles to enforcement selectively, starting with highest-risk agents first.
Here’s the insight most teams miss: you end up with per-agent guardrails automatically, because each agent’s observed behavior is unique. You don’t have to manually differentiate policies—the observation does it for you. Your fraud detection agent’s profile shows transaction database queries, two internal API endpoints, and zero external connectivity. Your data pipeline processor’s profile shows connections to three external data sources, batch file writes to a staging bucket, and scheduled process executions. Those are two completely different guardrails—generated from evidence, not guesswork.
ARMO calls these behavioral baselines “Application Profile DNA.” Each profile represents a container’s actual runtime behavior across all four enforcement dimensions, and each one becomes the enforcement policy for that specific agent.
Agent behavior isn’t static. Models get updated, prompts change, new tools get connected. This is where risk classification pays off. When a high-risk agent deviates from its baseline—say, the fraud detection agent connects to a network destination it’s never reached—enforcement blocks the deviation. When a lower-risk agent drifts, enforcement alerts without blocking, giving the team time to validate before updating the baseline. For a deeper look at how behavioral anomaly detection catches intent drift across agent fleets, see the dedicated detection guide. Baselines are continuous, not one-time—because manually recertifying every agent’s policy whenever a model updates doesn’t scale.
Discovery reveals a CrewAI agent from a hackathon demo that’s been running for three months with production credentials. Nobody remembered to tear it down. Its behavioral profile shows it accessing a customer database, two external APIs, and a cloud storage bucket—far beyond what anyone intended for a demo.
Without per-agent enforcement, this agent operates under whatever blanket policy exists—or none at all. With per-agent enforcement, it gets its own restrictive boundary immediately upon discovery, without affecting legitimate agents. The behavioral profile tells you exactly what it’s been doing, so you can constrain it to those behaviors or shut it down entirely.
This is where a runtime-derived AI Bill of Materials feeds directly into per-agent enforcement. You can’t set guardrails for agents you don’t know exist—and static manifests won’t catch a hackathon demo that was never formally deployed.
Your internal analytics agent runs nightly queries against a reporting database, generates summary dashboards, and stores results in an internal S3 bucket. It never connects outside the cluster. Your customer-facing interaction agent handles real-time requests, calls three external model APIs for inference, and writes interaction logs to a shared data store.
Apply the interaction agent’s permissions to the analytics agent and you’ve given an internal batch job external network reach and write access to a shared data store it should never touch. Apply the analytics agent’s restrictions to the interaction agent and it can’t reach the external model APIs it needs to function. Per-agent enforcement means each operates within boundaries that match its actual function—not a compromise that fits neither.
Per-agent enforcement is a regulatory requirement here, not just a best practice. The fraud detection agent reads transaction data. The customer service agent handles refund inquiries. Any policy granting the customer service agent the same database access creates a SOX audit finding.
When the auditor asks—and they will—“show me that each agent can only access the data required for its function,” behavioral profiles are the answer. The fraud detection agent’s profile shows transaction database access, internal-only connectivity, and no code execution. The customer service agent’s profile shows CRM API calls, limited network destinations, and no database access. Each profile is auditable evidence that enforcement is proportional to function—not a screenshot of a policy someone wrote six months ago. For the full picture on containing blast radius in regulated financial environments, see the dedicated guide.
The most common objection to per-agent enforcement is operational overhead. The observation-first approach eliminates that work entirely. Behavioral observation generates the policies. eBPF enforcement applies them at the kernel level. Security teams deploy independently, without developer cooperation or sidecars, at 1–2.5% CPU and 1% memory overhead.
The workflow is Kubernetes-native and cloud-agnostic—the same observe-to-enforce process works on EKS, AKS, and GKE. Cloud-specific primitives handle the IAM layer, while behavioral enforcement provides the cross-cloud constant. New agents start in observation mode automatically and progress to enforcement as baselines stabilize.
ARMO’s platform supports this workflow end-to-end: AI-BOM handles discovery, Application Profile DNA builds per-agent behavioral baselines, and eBPF-powered enforcement manages the transition from observation to active guardrails.
See how ARMO builds per-agent guardrails from behavioral observation—no manual policy writing, no code changes, no guesswork. Watch a demo.
Can I set different enforcement levels for different agents in the same cluster?
Yes—that’s a core principle. A fraud detection agent reading transaction data needs tighter constraints than an internal summarization bot. Per-agent enforcement boundaries are derived from each agent’s individual behavioral profile. High-risk agents get active blocking; lower-risk agents run in alert-only mode.
How long does it take to build per-agent behavioral profiles?
Most teams see usable profiles within 7–14 days of observation. Agents with predictable behavior—like customer support chatbots—baseline faster than agents with varied behavior, like data analysis agents that run different queries daily. Start enforcing selectively on your most stable, highest-risk agents first while others continue baselining.
What happens when a new agent is deployed?
New agents start in observation mode automatically. The system builds their behavioral profile before recommending enforcement policies—so you don’t need to pre-define what the agent should be allowed to do before it starts running.
Do per-agent guardrails require developer cooperation?
No. eBPF-based enforcement operates at the Linux kernel level. Security teams deploy, observe, and enforce independently—without modifying application code, injecting sidecars, or coordinating with development teams for policy changes.
Can per-agent guardrails work alongside existing OPA or Gatekeeper policies?
Yes. OPA and Gatekeeper handle admission-time validation—ensuring resource configurations meet standards before workloads deploy. Per-agent behavioral enforcement operates at a different layer, constraining what agents do at runtime after deployment. The two are complementary: admission policies prevent misconfigurations, behavioral guardrails prevent runtime abuse.
How do per-agent guardrails align with AI governance frameworks like NIST?
Per-agent enforcement maps directly to the NIST AI Risk Management Framework’s principles of proportionate risk management and continuous monitoring. The framework calls for risk responses calibrated to the specific AI system’s impact level—which is exactly what per-agent enforcement delivers: different agents, different risk levels, different enforcement intensity.
How do per-agent guardrails handle behavioral drift when models update?
Enforcement treats baselines as continuous, not one-time. When agent behavior deviates, the system either blocks (high-risk agents) or alerts (lower-risk agents) based on the agent’s risk classification. Baselines update once new behavior is validated as legitimate.
Your engineering lead is in your office Thursday morning. They want to push an AI...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...
A missing null check in libssh’s SFTP directory listing code lets a malicious server crash...