The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Mar 31, 2026
Your team deployed Tetragon six months ago. TracingPolicies are humming along—you’re catching unauthorized binary executions, blocking suspicious network connections, and generating seccomp profiles from observed behavior. Runtime security for your traditional workloads is solid.
Then engineering ships their first autonomous AI agent into production. A LangChain agent connected to internal databases, external APIs through MCP tool runtimes, and a vector database for RAG. Within the first week, your TracingPolicies are either firing constantly on legitimate behavior or missing agent activity entirely. The allowlists you wrote for your web servers don’t work for a workload whose network destinations depend on what a user asks it.
This is the problem nobody in the eBPF ecosystem has clearly addressed: eBPF-based runtime security works brilliantly for deterministic workloads, but AI agents break its assumptions. The enforcement substrate is right—kernel-level visibility with microsecond-speed policy decisions—but the policy model needs to change.
This article goes deep on that problem. It’s not an eBPF tutorial or a general AI security overview. It’s a practitioner-level analysis of what eBPF can and can’t enforce for AI agent workloads, where kernel-level visibility hits its ceiling, and what you need above it.
If you’re looking for the complete progressive enforcement methodology—discovery through full least privilege—read the AI Agent Sandboxing & Progressive Enforcement guide. This article assumes you’re already bought in on progressive enforcement and want to understand the mechanism.
eBPF-based runtime tools—Tetragon, Falco, KubeArmor—share a common enforcement model. They define policies around expected behavior: which binaries can execute, which network destinations are allowed, which syscalls are permitted. When something deviates from that policy, they alert or block.
This model works because traditional workloads are predictable. A web server listens on port 443, connects to a backend database, and serves HTTP responses. Its syscall profile is stable across deployments. You can write a TracingPolicy once and it holds.
AI agents don’t have stable syscall profiles. Their behavior is shaped by the prompts they receive and the reasoning paths the model takes. The OWASP Top 10 for Agentic Applications catalogs this as a foundational risk: agents granted tool access take actions that are unpredictable by design, not by accident.
Here’s what that looks like in practice across the three enforcement dimensions eBPF tools rely on:
Traditional approach: define allowed egress destinations by IP or DNS. A web server connects to db.internal:5432 and cache.internal:6379. Write a Cilium NetworkPolicy, done.
AI agent reality: a RAG agent resolves vector database endpoints dynamically, calls external APIs based on user queries, and may reach model inference endpoints that rotate across cloud regions. A coding agent generates HTTP requests to endpoints it discovers from documentation it’s reading. The set of “legitimate” destinations isn’t fixed—it’s a function of input. Static allowlists either break the agent or leave gaps wide enough to drive an exfiltration through.
Traditional approach: define expected process trees and syscall sets. A Java application runs as PID 1 in its container, spawns a few known threads, and uses a predictable set of ~120 syscalls. Generate a seccomp profile from observed behavior, enforce it.
AI agent reality: agents with code generation capabilities is the most dangerous AI capability – spawn interpreters, execute generated scripts, and create process trees that change with every invocation. A seccomp profile generated during Tuesday’s observation window may block legitimate behavior that appears on Wednesday with a different prompt.
Traditional approach: restrict filesystem access to known paths. An application reads config from /app/config, writes logs to /var/log, and touches nothing else.
AI agent reality: data analysis agents read from directories determined by their input. Coding agents write generated files to working directories. RAG agents access document stores whose contents shift as embeddings are updated. The filesystem access pattern is an output of the model’s reasoning, not a static property of the application.
The result: teams using standard eBPF runtime tools for AI workloads hit the same wall. They either write policies so loose they’re decorative, or so tight they break the agent within days. This is the policy paralysis problem applied to eBPF specifically—and the reason kernel-level enforcement alone isn’t enough.
Before diagnosing what’s missing, it’s worth understanding exactly what eBPF gives you. The technology is genuinely the right foundation for AI agent enforcement—the problem is treating it as the entire stack.
eBPF programs attach to kernel hooks and execute when specific events occur. For runtime security, three categories of hooks matter:
Dynamic probes that attach to almost any kernel function. They give you maximum flexibility—you can hook into tcp_connect to see every outbound connection, do_sys_open to see every file open, or do_execve to see every process execution. The trade-off: kernel function signatures aren’t part of the stable ABI, so kprobes can break across kernel upgrades. For AI agent enforcement, kprobes on network and exec functions are where most detection logic lives.
Stable kernel instrumentation points that survive upgrades. Tracepoints like sched_process_exec and sys_enter_* provide reliable event streams for process lifecycle and syscall monitoring. They’re the backbone of tools like Tetragon and Falco—predictable, well-documented, and safe to depend on in production. For AI workloads, tracepoints give you the durable event stream you need for behavioral baselining over weeks or months.
The enforcement layer. LSM hooks like bpf_lsm_file_open and bpf_lsm_socket_connect are specifically designed for access control decisions—they fire before the operation completes and can return a denial. This is what makes eBPF an enforcement technology, not just an observability technology. When an AI agent attempts to open a file outside its baseline or connect to an unauthorized endpoint, an LSM hook can block the operation before it succeeds.
Together, these hooks give you a complete enforcement substrate at the system boundary: every network connection, every file access, every process spawn, every syscall. The BPF verifier ensures these programs are mathematically proven safe before they load—no crashes, no unauthorized memory access, no infinite loops. And the overhead is deterministic: because enforcement runs per-node as a DaemonSet, overhead scales with nodes, not with the number of containers or AI agents.
This is what makes eBPF the right foundation. But “foundation” is the operative word.
Here’s the scenario that exposes the limitation. A customer support agent running in your cluster processes support tickets, queries an internal database, summarizes results, and posts responses to an internal dashboard. It’s been running for weeks. Its behavioral baseline—built from eBPF-observed syscalls and network connections—is well-established.
A customer submits a ticket containing a crafted indirect prompt injection. The injected instruction overrides the agent’s task context. Instead of its normal query, the agent pulls from a customer PII table it’s never accessed before, then POSTs the data to an external endpoint the agent has never contacted.
What does each layer see?
| Detection Layer | What It Sees | What It Misses |
| eBPF / kernel | New outbound TCP connection to unknown IP. DNS resolution for unfamiliar domain. Unusual read volume on database socket. | Why the connection was made. Whether the database query was legitimate. That a prompt injection triggered it. |
| Container runtime | No image drift. No unexpected processes. Possible network egress spike. | Whether egress is legitimate data transfer or exfiltration. No causal chain. |
| K8s control plane | Nothing. RBAC unchanged, service account unchanged, API server untouched. | Everything. The attack happened within authorized boundaries. |
| Application layer (L7 + tool invocations) | Injected prompt in input stream. Unauthorized database tool invocation targeting new table. Outbound POST containing PII to unknown domain. Full causal chain from ticket to exfiltration. | Nothing material—this is the layer with the complete picture. |
The eBPF layer detected anomalies—real signals. But it couldn’t distinguish this attack from the agent processing a ticket that legitimately requires a new data source. The semantic context—understanding that a prompt injection caused an unauthorized tool invocation that led to data exfiltration—is invisible at the syscall level.
This isn’t a flaw in eBPF. It’s a boundary condition. eBPF operates at system boundaries (kernel hooks), while AI agent threats often manifest at application boundaries (tool calls, prompt processing, chain execution). The AgentSight research from UC Berkeley frames this as the “semantic gap”—the disconnect between an agent’s high-level intent (observable via LLM communications) and its low-level actions (observable via syscalls). Effective AI agent enforcement needs both.
If eBPF is the enforcement substrate, what’s the intelligence layer?
For traditional workloads, the policy layer above eBPF is relatively simple—TracingPolicies, seccomp profiles, NetworkPolicies. These are static definitions that map cleanly to kernel events. For AI agents, the policy layer needs to be dynamic, behavioral, and application-aware.
The core problem with static policies for AI agents is that “normal” isn’t a fixed state—it’s a distribution. An agent might connect to 15 different API endpoints in a given week, depending on what users ask it. A static allowlist built from one week’s observation misses legitimate behavior from the next week.
What’s needed is behavioral profiling that learns the range of normal behavior over time and detects deviations from that range—not violations of a fixed list. ARMO’s approach to this is what they call Application Profile DNA: a behavioral representation of each container built from runtime observation that adapts as the agent’s behavior evolves. This is the foundation layer the progressive enforcement methodology is built on—you observe before you enforce, and the resulting policies reflect a distribution, not a snapshot.
eBPF sees network packets. Application-layer monitoring sees what’s inside them: which API endpoint was called, what data was requested, which tool the agent invoked, and how the response was used. This is the layer that turns a “new TCP connection to IP X” into “the agent invoked its database tool with a query targeting a PII table outside its behavioral baseline.”
ARMO’s eBPF sensor goes beyond syscall-level telemetry—it monitors HTTP traffic content, function calls within containers, tool invocations at the application layer, and the full agent execution chain. This is what their CTO calls “much higher resolution around the application layer”: the ability to see not just that a process ran, but what it did, why, and with what data. Combined with kernel-level enforcement, it closes the semantic gap that generic eBPF tools leave open.
Different agents have fundamentally different risk profiles. A customer support chatbot with read-only database access is a different enforcement problem than a coding agent that spawns interpreters and generates HTTP requests. Generic eBPF tools apply the same TracingPolicy model to both.
AI-aware enforcement needs per-agent boundaries derived from each agent’s individual behavioral profile. The coding agent gets tighter process and syscall constraints (because code generation is the highest-risk capability). The support chatbot gets tighter data access constraints (because PII exposure is its primary risk). These aren’t guesses—they’re policy decisions derived from observed behavior over time.
Deploying eBPF-based enforcement for AI workloads introduces cloud-specific constraints that don’t surface with traditional workloads. Here’s what actually matters per platform.
eBPF enforcement capabilities depend heavily on kernel version. CO-RE (Compile Once – Run Everywhere) requires BTF (BPF Type Format) data, which is available on kernels 5.2+ but not universally enabled across all cloud provider node images. If you’re running custom or older node images, verify BTF availability before deploying any eBPF-based enforcement.
| Cloud Provider | Recommended Node Image | BTF / eBPF Notes |
| AWS EKS | Amazon Linux 2023 or Bottlerocket | AL2023 has strong eBPF support. Bottlerocket’s minimal OS reduces attack surface. Older AL2 images may need kernel upgrades for full LSM hook support. |
| Azure AKS | Ubuntu 22.04+ based node pools | Ubuntu images have the most reliable BTF and eBPF support on AKS. Azure Linux (Mariner) works but verify BTF is enabled for your kernel version. |
| Google GKE Standard | Container-Optimized OS (COS) or Ubuntu | COS has mature eBPF support. GKE Standard gives full control over node images and DaemonSet deployment. |
| Google GKE Autopilot | Managed (limited control) | Autopilot restricts privileged DaemonSets. eBPF-based enforcement that requires privileged access may not deploy. Use Standard clusters for AI workloads requiring full kernel-level enforcement. |
GKE Autopilot deserves specific attention because it’s where many teams first encounter the conflict between managed Kubernetes convenience and eBPF enforcement requirements. Autopilot’s security model restricts privileged workloads—which is the right default for multi-tenant environments, but it means eBPF DaemonSets that need CAP_SYS_ADMIN or CAP_BPF capabilities may be blocked.
For AI workloads specifically, GKE offers the Agent Sandbox CRD with managed gVisor for code execution isolation. This is complementary to eBPF behavioral enforcement: the CRD handles where the agent runs (isolated sandbox), while kernel-level behavioral enforcement handles what the agent does once running. Most teams deploying AI agents in GKE will want Standard clusters for the enforcement layer and can use Autopilot for less sensitive workloads.
AI agents calling cloud-native AI services—Bedrock on AWS, Azure OpenAI Service, Vertex AI on GCP—need cloud IAM identities. eBPF enforcement at the kernel level complements but doesn’t replace IAM boundary enforcement at the cloud layer. Use IRSA (EKS), Workload Identity (AKS/GKE) for pod-level cloud identity, and eBPF-based behavioral enforcement for what the agent does once it has that identity. These layers work together: IAM controls which cloud resources the agent can access, and behavioral enforcement controls how it uses that access.
If you’re already running Tetragon, Falco, or another eBPF-based runtime tool, here’s a practical summary of what changes when your workloads are AI agents.
| Enforcement Dimension | Generic eBPF (Tetragon/Falco) | AI-Aware eBPF (ARMO’s Approach) |
| Policy model | Static allowlists and TracingPolicies defined before deployment | Dynamic policies derived from observed behavioral baselines over time |
| Baseline approach | Snapshot of syscalls/connections during observation window | Behavioral distribution (Application Profile DNA) that adapts as agent behavior evolves |
| Detection context | Syscall-level anomalies (new connection, unexpected exec) | Full-chain context: syscall anomaly + application-layer tool invocation + L7 traffic content |
| Enforcement granularity | Per-pod or per-namespace policies | Per-agent policies reflecting individual behavioral profiles and risk levels |
| AI-specific threats | Generic anomaly detection (works but high false positives on non-deterministic workloads) | Agent escape detection, tool misuse, prompt-injection-driven behavioral deviation |
| Overhead | ~1–3% CPU per node (varies by tool and policy count) | 1–2.5% CPU, ~1% memory per node (DaemonSet architecture, scales with nodes not pods) |
The kernel-level substrate is the same—eBPF hooks, BPF verifier, DaemonSet deployment. What changes is the intelligence layer above it: behavioral baselines that tolerate non-determinism, application-layer visibility that closes the semantic gap, and per-agent policy granularity that reflects the reality that different agents present fundamentally different risk profiles.
Security frameworks are converging on what AI agent enforcement should look like. The NIST AI Risk Management Framework calls for continuous monitoring of AI system behavior. MITRE ATLAS catalogs adversarial techniques targeting AI systems. KPMG’s Q4 AI Pulse Survey found that 75% of enterprise leaders cite security, compliance, and auditability as the most critical requirements for AI agent deployment.
These frameworks tell you what to worry about. None of them tell you how to enforce it at the kernel level without breaking production.
eBPF is the bridge between governance requirements and operational enforcement. OWASP says constrain agent autonomy—eBPF enforces that constraint at the syscall level. NIST says continuously monitor AI behavior—eBPF observes every system interaction with sub-millisecond overhead. Compliance frameworks require audit trails—eBPF generates them from actual runtime behavior, not policy documents.
But the bridge only works if the enforcement layer understands what it’s enforcing.
That’s the argument for AI-aware enforcement built on eBPF: the kernel substrate satisfies the performance and coverage requirements, and the application-layer intelligence satisfies the semantic requirements. ARMO’s platform combines both—eBPF-based kernel enforcement with application-layer monitoring—and supports compliance frameworks including CIS, NIST, SOC2, PCI-DSS, HIPAA, and GDPR with 260+ purpose-built Kubernetes controls and continuous automated monitoring.
Watch a demo of the ARMO platform to see how eBPF-based enforcement works in practice for AI agent workloads.
I already use Tetragon/Falco. Do I need something different for AI agents?
Not necessarily “different”—more like “additional.” Tetragon and Falco are strong eBPF-based runtime tools for detecting kernel-level anomalies. But AI agents generate enough behavioral variance that static TracingPolicies produce high false-positive rates. You need a layer above that builds dynamic behavioral baselines and adds application-layer context to kernel-level signals.
What kernel version do I need for eBPF-based AI agent enforcement?
Kernel 5.8+ for full LSM hook support, 5.2+ for BTF/CO-RE compatibility. Most current managed Kubernetes node images (EKS with AL2023, AKS with Ubuntu 22.04, GKE Standard with COS) meet these requirements. Check BTF availability explicitly if running custom images.
Can eBPF enforcement detect prompt injection attacks?
Not directly. eBPF sees the consequences of a prompt injection—unusual network connections, unexpected tool invocations, anomalous file access—but it can’t see the injected prompt itself. Detecting the attack at its source requires application-layer monitoring of the agent’s input stream. eBPF catches the behavioral deviation; application-layer intelligence identifies the root cause.
How does the overhead compare between generic eBPF tools and AI-aware enforcement?
Comparable. ARMO’s overhead is 1–2.5% CPU and ~1% memory per node—in line with Tetragon and Falco deployments. The key difference is architectural: DaemonSet-based enforcement means overhead scales with nodes, not with the number of AI agents or containers. A cluster with 500 pods on 10 nodes pays the same per-node cost as 50 pods on those 10 nodes.
Your engineering lead is in your office Thursday morning. They want to push an AI...
A platform security engineer gets an alert at 2:14 a.m. One of the LangChain agents...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...