The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Mar 17, 2026
It’s 2:47 AM and your SOC dashboard lights up. Six alerts fire across three hours from a single Kubernetes cluster: an outbound HTTP fetch to an unfamiliar domain, a tool invocation inside a customer support agent, an API call to an internal service the agent has never contacted, a service account token read, a file write to a model artifact directory, and an outbound data transfer that looks like normal API usage.
Your container security tool dutifully logged all six events. It flagged two as medium severity. The other four passed through clean—each one looked like legitimate container behavior. No alert was connected to any other. No incident was created. And by the time your morning shift reviews the overnight logs, the customer data is already on an attacker-controlled server.
This isn’t a failure of alert volume. Your tools fired. The failure is architectural: generic container security tools see process execution, network connections, and file access. What they can’t see is why those events happened—whether a tool invocation was user-driven or attacker-steered, whether an outbound connection was a legitimate RAG retrieval or the start of a data exfiltration chain, whether a file write was a routine update or an attacker planting a backdoor.
AI agents operate in exactly the gap where this blind spot lives. They interpret prompts at runtime, invoke tools dynamically, and escalate privileges in ways no developer anticipated—all as part of normal operation. The signals that indicate compromise in a traditional container are indistinguishable from an AI agent doing its job. This is the core detection problem that most security stacks were never built to solve.
This article walks through a concrete six-stage attack chain—following a single customer support agent from initial RAG poisoning through data exfiltration—and shows, at each stage, what your container tool sees, what it misses, and why. The goal: make the visibility gap concrete enough that you can evaluate whether your own detection stack would catch it.
Container tools answer “what happened”—process X executed, network connection Z opened. They cannot answer what matters for AI workloads: Why did this tool run? Was it triggered by a user, by the agent’s reasoning, or by an attacker-injected instruction? Does the sequence of actions across the last three minutes constitute normal agent behavior or an attack chain?
This context gap explains a pattern that appears across every AI-specific attack chain examined in detail: kernel-level and container runtime detection catch symptoms but never root cause. The Kubernetes control plane is blind in every scenario. Only the application layer—where prompts, tool invocations, and execution chains are visible—sees the full picture. The attack vectors specific to AI agents are categorically different from what container runtime tools were built to detect.
Let’s make this concrete with a single attack chain from start to finish.
The scenario: a customer support agent running in your Kubernetes cluster. Built on LangChain, it processes incoming tickets, categorizes them by severity, and writes summaries to an internal dashboard. It has read access to a customer database through a tool integration, RAG access to a knowledge base of support documentation, and network access to internal API endpoints. Routine permissions for a support workflow.
The attacker compromises a document in the knowledge base that the agent’s RAG pipeline retrieves—a poisoned support article, a manipulated vector database entry, or a compromised external data source. The document contains embedded instructions designed to manipulate the agent’s behavior when processed.
What your container tool sees: An outbound HTTP fetch or vector database query. Most likely: nothing notable. RAG retrieval is what the agent does all day.
What it misses: Container tools have no concept of RAG pipelines or data provenance. They see a network request but cannot evaluate whether the content will alter the agent’s downstream actions. There’s no mechanism to flag “this agent normally retrieves from sources A, B, and C, but just fetched from source D for the first time”—that analysis requires understanding data dependencies, not just network connections.
What AI-aware detection sees: A runtime-derived AI-BOM (AI Bill of Materials)—an inventory of the agent’s models, tools, RAG sources, and dependencies built from observed runtime behavior—establishes normal data access patterns. When a new, previously unseen source feeds content to a privileged agent, behavioral analytics flag the deviation. This creates the first signal in what may become a correlated incident.
The poisoned document contains hidden instructions—an indirect prompt injection. The agent follows attacker-crafted directives because, from the model’s perspective, they arrived through a trusted data channel. It invokes tools it wasn’t supposed to call, or calls permitted tools with parameters that serve the attacker’s goals.
What your container tool sees: A tool execution—the same kind of function the agent invokes hundreds of times daily. No alert, or a noisy “command executed” notification with no context about what triggered it.
What it misses: Prompt text is invisible to container tools. They operate at the syscall and process level. The critical question—“was this tool call triggered by a user request or an attacker-injected instruction embedded in a RAG document?”—is unanswerable from the container layer. The tool call is permitted by the agent’s declared permissions. There is no exploit signature.
What AI-aware detection sees: A mismatch between the agent’s typical tool invocation sequence and the current chain. The agent normally calls the knowledge base lookup tool after receiving a ticket but now invokes the customer database query tool immediately after processing a RAG document. ARMO’s AI-aware behavioral detection links this tool call to the suspicious RAG content from Stage 1. Two signals, one developing story.
The compromised agent uses its legitimate API access to reach internal services it wouldn’t normally contact—fetching data from adjacent services, testing access boundaries, mapping the internal topology. Its existing service account and network policies permit these connections. From the perspective of every infrastructure control, the agent is using credentials it was granted at deployment.
What your container tool sees: Internal service calls. Possibly a spike in east-west traffic. If you’re running network policies at the Kubernetes level, these calls are permitted.
What it misses: Container tools don’t maintain an application-level graph of which internal APIs each agent normally calls. They see connections by IP and port. Without a behavioral model that says “this agent calls /api/tickets and /api/knowledge but has never called /api/users or /api/billing,” the new destinations look like normal service-to-service traffic. The attacker now has a map of what internal data is accessible—and there’s no chain context connecting this to the preceding two stages.
What AI-aware detection sees: The runtime-derived AI-BOM maps each agent’s normal internal API graph. New destinations are flagged against this baseline. ARMO’s CADR correlates across stages, recognizing this lateral movement as the third step in a chain that started with anomalous RAG retrieval and continued with attacker-steered tool misuse.
The agent accesses Kubernetes service account tokens mounted in the container—automatically mounted in many default configurations. The attacker uses these tokens to authenticate against the Kubernetes API with elevated permissions. MITRE ATT&CK classifies this as technique T1611, and in environments where service accounts are over-provisioned, it can provide cluster-wide access.
What your container tool sees: A token read from the filesystem, followed by a Kubernetes API call. Some tools flag the access if the reading process isn’t the primary container process. This is one of two stages where container tools produce a potentially useful alert.
What it misses: The alert is isolated. Your tool sees “unexpected token access” but not that it follows lateral movement, which followed tool misuse, which followed RAG poisoning. Token access in isolation gets triaged as a misconfiguration issue and assigned to a platform team for next sprint. In context, it’s an active privilege escalation three stages deep into a data breach.
What AI-aware detection sees: eBPF-based runtime telemetry detects the abnormal token access pattern. CADR elevates severity because the correlation engine sees privilege escalation following three precursor signals. The incident confidence score jumps. Four connected stages.
With elevated privileges, the attacker modifies model artifacts, inference hooks, or dependencies to maintain persistent access—altered model weights, a malicious function injected into an inference pipeline, or a backdoored dependency that activates under specific conditions. The goal is persistence: even if the SOC disrupts the immediate attack, the next time a user sends the agent a routine question, the backdoor reactivates and the chain resumes from Stage 3.
What your container tool sees: File writes to model artifact directories. In environments with active CI/CD pipelines, these events are drowned in deployment noise.
What it misses: Integrity tools may alert on file changes, but they answer “what changed,” not “why now.” A model artifact change at 3 AM following privilege escalation is profoundly different from a scheduled deployment update. Without the chain, there’s no way to make that distinction.
What AI-aware detection sees: The runtime-derived AI-BOM tracks model artifacts and dependencies as a baseline. When components drift—especially following privilege escalation—ARMO flags the delta as a persistence attempt within the same correlated incident. Five connected stages, now labeled as a high-confidence multi-stage compromise.
The compromised agent exfiltrates sensitive data to external destinations. What makes AI-mediated exfiltration particularly dangerous: the agent can summarize, transform, or encode data before sending it. A customer database with 10,000 records becomes a compressed summary. PII is restructured into a format that evades traditional DLP controls because the content has been semantically transformed.
What your container tool sees: An outbound traffic spike or allowed egress connection. DLP tools looking for raw PII patterns miss the exfiltration because the agent has reformatted the data.
What it misses: This is the second stage where container tools produce a potentially useful alert—an unusual egress destination or volume anomaly. But without the preceding five stages of context, an outbound POST from an agent that regularly makes outbound calls looks routine.
What AI-aware detection sees: Behavioral correlation across process, file, network, and application layers. The egress is evaluated in the full chain context and classified as data exfiltration with high confidence. ARMO’s CADR produces a single prioritized incident narrative—the full story from Stage 1 through Stage 6 with an evidence timeline—compressing containment from hours of manual log correlation to minutes.
One prioritized incident story replaces scattered events across different dashboards. Investigation compresses because call stacks, entity graphs, and the full attack chain narrative show exactly how the compromise progressed. This is what the 2025 Latio Cloud Security Market Report describes as the shift from static visibility to runtime-driven risk reduction—the category it formally defines as Cloud Application Detection and Response (CADR).
| Stage | Generic Container Tool | AI-Aware Runtime (ARMO CADR) |
| 1: RAG Poisoning | No alert or benign HTTP log | Anomalous data source flagged via AI-BOM baseline |
| 2: Prompt Injection | No alert (prompt invisible) | Tool invocation mismatch detected; linked to Stage 1 |
| 3: Lateral Movement | Possible request spike—unclear | New API destinations flagged against agent’s behavioral graph |
| 4: Token Theft | Token access alert (isolated) | Abnormal token pattern; severity elevated by chain context |
| 5: Model Tampering | File write (drowned in CI/CD noise) | AI-BOM drift detected; labeled as persistence |
| 6: Data Exfiltration | Egress anomaly (no chain context) | Classified as exfiltration; full 6-stage story with timeline |
The six-stage chain is assembled from documented attack patterns: RAG poisoning and prompt injection demonstrated in NVIDIA’s AI Kill Chain research; service account token theft documented in MITRE ATT&CK;
The individual stages are well-understood. What’s new is the recognition that they chain together through AI agent behavior—and that this chaining is invisible to tools that evaluate events in isolation.
For SOC teams, the question is not “do we get alerts?” It’s “do our alerts reconstruct the chain?” Here’s an evaluation rubric—each answer should be demonstrable in a vendor demo:
If your stack falls short on multiple criteria, it’s not because you chose the wrong vendor for container security. It’s because AI agents represent a fundamentally different workload category that demands detection at the application layer.
The progressive enforcement approach offers a practical path forward: deploy in visibility-only mode, build behavioral baselines through runtime observation, then layer in detection and enforcement based on evidence. ARMO’s CADR engine is purpose-built for this workflow—correlating signals across cloud, Kubernetes, container, and application layers to turn the six disconnected alerts from our scenario into one prioritized incident story.
Watch a demo to see how ARMO reconstructs the full attack story across your AI workloads.
No. Prompt injection operates within the application layer—it manipulates how the model interprets input, not how the container executes processes. Container tools see downstream effects (a tool call, a network connection) but cannot see the prompt that triggered them or determine whether the instruction was attacker-injected.
Chain reconstruction capability. Individual alert quality matters less than the ability to correlate multiple low-confidence signals into a high-confidence incident. If your tool evaluates each signal in isolation, the attack completes before you finish correlating manually.
A traditional SBOM lists packages declared in deployment manifests. An AI-BOM is built from observed runtime behavior and includes models, frameworks, RAG sources, tools, and APIs—many invoked dynamically and never appearing in static manifests. The AI-BOM is the baseline that makes anomaly detection possible for non-deterministic workloads.
A CNAPP would catch posture issues that enable the chain—over-provisioned service accounts, missing network policies—but would miss the attack itself. CNAPPs are designed for configuration and posture assessment, not runtime behavioral detection. They identify that excessive permissions exist but won’t detect when those permissions are exploited through an AI agent compromise.
Your engineering lead is in your office Thursday morning. They want to push an AI...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...
A missing null check in libssh’s SFTP directory listing code lets a malicious server crash...