Get the latest, first
arrowBlog
AI Agent Attack Detection: The Complete Framework for Security Teams

AI Agent Attack Detection: The Complete Framework for Security Teams

May 15, 2026

Shauli Rozen
CEO & Co-founder

Key takeaways

  • Why isn’t AI agent attack detection just “container detection with AI signatures”? Agent attacks operate on the agent’s decision surface — its prompts, tools, and identity — not on the container’s syscall surface where signatures live. Signature-based extension doesn’t reach this layer because the attack signal lives in sequence and context, not in any single event a signature could match.
  • What’s the single biggest mistake security teams make when building agent attack detection? Starting with the tools they already have and bolting agent context onto them. The framework has to be re-architected from the telemetry layer upward — if the source signals don’t exist, no amount of dashboard work will produce a detection capability that catches AI-specific attacks.
  • Why is correlation the hardest layer to get right? Signals from the four surfaces live in four different tool classes — WAF and SDK callbacks for input, framework SDKs and eBPF for tool invocation, IAM and Kubernetes audit for identity, orchestrator telemetry for cross-agent — and each was built for a different threat model. Correlation isn’t a SIEM feature. It’s an architectural decision about which plane assembles the chain.

It usually starts the same way. The CISO comes back from a board meeting having signed off on agentic AI for production. The SOC lead is told, in roughly that many words, to build detection for the agents. And the security stack she has — CNAPP for posture, EDR on the nodes, container runtime sensors, a SIEM ingesting everything — was architected before AI agents existed as a workload class. None of it was designed to detect attacks that arrive as data, ride on the agent’s own credentials, and unfold across stages that look identical to the agent doing its job.

What’s missing isn’t another tool. It’s a framework — one that organizes detection around the agent’s attack surfaces rather than the workload’s tool layers. The unit of analysis has to shift. Instead of asking what does my CNAPP see, the question becomes where does an agent attack actually cross into the runtime? That reframe is what makes detection portable across attack types, and what makes the framework hold when new attack categories emerge.

This is the operating model for that capability. The framework has four parts. Four detection surfaces describe where an attack must cross to become detectable. A five-layer operating stack describes how a security team turns those surfaces into action. A 2×2 maturity grid maps surface coverage against stack depth so a program can see where it stands. And a four-question readiness diagnostic translates the grid into the conversation that needs to happen before an agent ships.

Every Agent Attack Crosses One of Four Surfaces

Pull apart any AI agent attack from the past year — indirect prompt injection through customer email, RAG poisoning across a vector store, tool misuse via excessive scope, multi-agent delegation hijack — and they share one structural property. The attack must cross at least one of four points on the agent itself before it produces an effect. Detection organized around those four points is portable across attack types in a way that detection organized around tool layers is not. Container layer, Kubernetes control plane, application layer — these are where you can look. The four surfaces are what the attacker must cross.

Surface 1 — Input & Reasoning

The Input & Reasoning surface covers everything that enters the agent’s decision context: the user prompt, the system prompt, any retrieved RAG content, prior turns from memory, and any tool output the agent treats as authoritative context. This is where most attacks originate. Indirect prompt injection lives here. So does RAG poisoning. So does memory poisoning. The signal type is content and provenance: what entered the context window, where did it come from, and was it tagged as data or as instruction.

Instrumentation: LLM call wrappers that capture full prompt context, RAG ingestion audit logs that track what got embedded and when, and context-window inspection at the framework SDK layer. WAFs see none of this — the payload is data, not an HTTP request pattern.

Surface 2 — Tool Invocation

Once the agent has decided to act, every action goes through a tool call. Surface 2 covers which tools the agent invokes, the parameter shape of each call, the sequence of calls within a session, and the frequency relative to baseline. The signal type is behavioral envelope — the boundary around which combinations of tool calls represent the agent doing its job, and which combinations represent something else.

This is where most attacks become visible. Tool misuse, excessive agency, scope expansion — all surface here. The detection mechanism that works is per-agent baselines tied to deployment-level behavior, because tool calls in isolation are almost always authorized; the malice is in the combination. ARMO’s work on detecting rogue agent tool misuse covers the scope, sequence, and rate categories in depth.

Instrumentation: framework SDK callbacks (LangChain, CrewAI, AutoGen produce structured tool-call events), MCP server logs for protocol-level visibility, and eBPF process trees that catch the syscall-level execution that follows each tool call.

Surface 3 — Identity & Action

Surface 3 is where the agent’s intent becomes infrastructure impact. Which permissions did the agent exercise? Against which targets? At what rate? The signal type is declared-versus-observed scope: every agent has a permission set on paper and an actual permission set in practice, and the gap between them is the field where escapes, exfiltration, and lateral movement happen.

This is the surface where impact manifests. Catching AI agent escape happens here, because escape is a permission-action pattern that breaks the agent’s normal scope. Data exfiltration through allowed channels happens here, because the destination is always allowlisted — the anomaly is the volume and pattern, not the endpoint.

Instrumentation: Kubernetes audit logs for API-level actions, cloud IAM event streams for role and permission usage, and per-agent identity tags that route everything to a specific agent identity rather than a shared service account. ARMO’s Application Profile DNA captures this surface by building a per-Deployment behavioral baseline that compares declared scope to observed scope continuously.

Surface 4 — Cross-Agent Coordination

The fourth surface only exists in multi-agent systems, but it matters because every per-agent baseline structurally cannot see it. Delegation edges between agents. Shared-context coordination through scratchpads and vector stores. Orchestrator decisions that route work from one agent to another. These are the points where a compromised Agent A passes a malicious instruction to Agent B through a path that looks like normal coordination on both sides.

The signal type is graph behavior: what’s the normal shape of the delegation graph, and how does today’s traffic deviate. Per-agent detection structurally cannot reach this surface, regardless of how many sensors are deployed.

Instrumentation: orchestrator framework telemetry — LangGraph state transitions, CrewAI delegation events, AutoGen speaker selections — combined with shared-context store audit logs that capture write-then-read patterns across agents.

The surface-by-tool-class diagnostic

The reason this decomposition matters operationally is that no single existing tool class covers more than one surface. WAFs touch Surface 1 partially. CNAPPs touch Surface 3 partially. EDR touches Surface 2 partially at the process layer. Framework observability tools like LangSmith and Arize touch Surfaces 1 and 4 but live in the developer dashboard, not the SOC. The detection program a security team builds has to span the surfaces — and any surface left to the wrong tool class is a surface running without coverage.

Detection Runs as a Five-Layer Stack — Missing Any Layer Breaks the Rest

Once the surfaces are instrumented, the detection capability still has to operate. That operation runs as a five-layer stack. Each layer produces a specific output the next layer consumes. Skip a layer or run it weakly and the layers above it have nothing to work with.

Layer 1 — Runtime Telemetry

The foundation. Layer 1 produces the ground-truth signal across all four surfaces — what actually happened at the kernel, the container, the Kubernetes API, and the application. The substrate that works best is eBPF at the kernel layer combined with application-layer hooks at the framework SDK layer. eBPF captures the syscall, process, network, and file events; the framework hooks capture the prompts, tool calls, and orchestrator decisions that explain why those syscalls happened.

Without Layer 1, every higher layer operates on second-order proxies — log aggregates, metric summaries, alert streams from upstream tools — and the chain reconstruction at Layer 3 becomes impossible. Most teams operating detection on AI workloads today are running Layer 1 partially — kernel telemetry only, no application-layer hooks — and the gap shows up as alerts without context. The deeper case for unified runtime telemetry runs through runtime observability for AI agents.

Layer 2 — Per-Agent Baselines

Layer 2 produces the answer to “is this normal?” for any agent. The mechanism that works is a per-agent behavioral profile maintained at the Deployment level, not per-pod. Per-pod baselines don’t converge for ephemeral AI workloads because individual pods don’t live long enough to capture an agent’s full operational range. ARMO calls the Deployment-level profile Application Profile DNA; it captures tool-call patterns, network destinations, process activity, file access patterns, and identity usage as a single behavioral envelope per agent.

The convergence window matters. Two to four weeks of observation is typical for production agents handling varied user requests across load conditions. Run alerting before convergence and every legitimate edge case fires; run with too short a window and the baseline misses operational variety it should have absorbed. ARMO’s case for defining normal agent behavior with runtime data walks through the four signal categories and the drift refinement process in depth.

Layer 3 — Cross-Layer Correlation

Layer 3 produces the attack story. It takes signals from all four surfaces — an input from Surface 1, a tool call from Surface 2, a permission exercise from Surface 3, optionally a delegation from Surface 4 — and joins them into a single causal chain with timeline, entities, and impact assessment. Without Layer 3, the analyst opens five tools and reconstructs the chain by hand at 2 AM.

ARMO’s CADR was built specifically to occupy this layer. It assembles signals from the application surface, the container runtime, the Kubernetes API events, and the cloud audit stream into a single attack narrative. The point isn’t faster correlation; it’s that correlation becomes architectural rather than analyst-improvised.

Layer 4 — Triage Decision

Layer 4 produces a classification of any assembled chain into one of three tiers: info-only (a benign event that resembles attack), attack-attempt (failed or unsuccessful), or active-attack (in progress with observable impact). The mechanism is an explainability framework operating on Layer 3 output. Without Layer 4, every chain looks the same and the analyst has no decision support — the framework is silent on whether to page, investigate, or document.

Layer 5 — Response Action

Layer 5 produces containment that’s specific to the surface where the attack lives. A Surface 2 tool-misuse attack contains via tool-scope revocation. A Surface 3 escape contains via permission revocation or per-agent quarantine. A Surface 1 RAG poisoning contains via index isolation and re-vetting. Without Layer 5, the attack continues during investigation. The enforcement mechanism that supports this is per-agent kernel-level enforcement informed by the same baselines Layer 2 produces — the observe-to-enforce methodology that turns runtime observation into surface-specific containment.

Most Production Programs Cover One Surface — Here’s the Path to Four

A linear maturity model — Level 0 through Level 4 — would suggest detection capability is one variable that goes up over time. It isn’t. Detection capability has two dimensions: how many of the four surfaces a program instruments (surface coverage), and how far up the five-layer stack the program operates (stack depth). A program with deep stack depth on a single surface is different from a program with shallow stack depth across all four. Both have the same “level,” and neither caught the last incident.

The honest read on production AI deployments today: most sit at surface coverage of 1, stack depth of 2. Surface 3 is the most commonly covered because IAM is already a security discipline; stack depth 2 because telemetry and baselines exist but correlation, triage, and response do not. Programs that hit surface coverage 2 typically add Surface 2 next — Tool Invocation — through framework SDK callbacks. Programs that hit stack depth 3 add a correlation plane that joins what they already have.

The 2×2 grid as a working artifact:

Stack Depth 1–2 (telemetry + baselines)Stack Depth 1–3 (+ correlation)Stack Depth 1–5 (+ triage + response)
Surface coverage 1Common. Surface 3 only, alerts without chain context. Advance: instrument Surface 2 via framework SDK callbacks.Single-surface attack story; structural blind spots elsewhere. Advance: add Surface 2 SDK callbacks before deepening stack.Rare; over-investment in depth before coverage. Advance: expand surface coverage before adding more layers.
Surface coverage 2Surfaces 2 + 3 instrumented; signals siloed. Advance: deploy correlation plane joining the two.Two-surface chains assemble; gap on input-driven attacks. Advance: add Surface 1 via LLM call wrappers and RAG ingestion audit.Two-surface stories with triage and response; missing input attribution. Advance: add Surface 1 instrumentation.
Surface coverage 3Three surfaces in telemetry; no chain output. Advance: deploy correlation plane.Three-surface chains; no triage classification. Advance: add Layer 4 explainability tier.Three-surface chains with triage and response; multi-agent attacks invisible. Advance: add Surface 4 orchestrator telemetry.
Surface coverage 4All surfaces in telemetry but stack still shallow. Advance: deploy correlation plane.Four-surface chains assembled; no decision automation. Advance: add triage tier and per-surface response.Target state. Continuous tuning and per-agent enforcement.

What the grid forces is the honest conversation. A team that says “we’re mature on detection” because they have eBPF telemetry running is probably at coverage 1, depth 2. A team that runs a correlation plane but only on Surface 3 events is at coverage 1, depth 3. The grid surfaces the second dimension that linear maturity models hide.

Four Questions to Answer Before You Commission an Agent for Production

The 2×2 grid describes where the program is. The four questions below translate the grid into the conversation that has to happen before an agent ships. This is a Detection Capability Test. The Runtime Context Test in the AI workload security buyer’s guide does the same diagnostic work for vendor evaluation; this version applies it to the program itself. Each question targets a specific gap; the gap reveals where the program sits on the grid.

Q1 — Can you identify every AI agent running in production?

Good answer: a runtime-derived inventory of every agent — names, deployments, identities, dependencies, tools — built from observed execution. Not a JIRA spreadsheet, not a manifest, not a list anyone has to remember to update. The artifact ARMO calls a runtime AI-BOM lives at this layer, and the deeper case for why static AI-BOMs fall short covers the failure modes. What the gap reveals: Surface 3 is structurally inaccessible without an agent-by-name inventory, because permission analysis requires knowing which agent owns which identity. Ben Hirschberg has said the first thing any security team needs is visibility — this question is whether that visibility actually exists for agents.

Q2 — For each agent, do you have a per-agent behavioral baseline at the deployment level?

Good answer: a per-Deployment behavioral profile, converged over two to four weeks of observation covering normal load and edge cases, with declared-vs-observed scope reconciliation. The baseline is what makes deviation meaningful — without it, every tool call is either an alert or noise depending on how the rule is tuned, and neither is useful. What the gap reveals: Surface 2 detection cannot work. Generic rules across agents produce unacceptable false-positive rates, because what’s normal for one agent is anomalous for another. No per-agent baselines means the next Surface 2 alert is going to be wrong in one direction or the other.

Q3 — For each of the four detection surfaces, name the signal source you rely on.

Good answer: a one-line mapping per surface to an instrumented signal source. Surface 1: LLM call wrapper plus RAG audit. Surface 2: framework SDK callbacks plus eBPF process trees. Surface 3: Kubernetes audit logs plus cloud IAM events. Surface 4: orchestrator framework telemetry. If any line is blank, that’s the surface where the next incident will start. What the gap reveals: surface coverage on the 2×2 grid, directly. Every named source is a covered surface; every blank is a coverage gap.

Q4 — For an alert from any one surface, can you trace it backward to the input surface event that caused it?

Good answer: yes — through a correlation plane that produces an attack story with timeline, entities, and causal chain. The output is one assembled narrative the analyst opens, not five tools the analyst joins manually. Competing correlation approaches either operate at the SIEM layer, where signals have already lost their context, or at the alert level, where the join is by metadata rather than causation. What the gap reveals: stack depth on the 2×2 grid. No traceback means Layer 3 is missing, regardless of how good the source signals are.

Build the Framework, Then Find the Next Surface

The framework lands as four pieces that operate together. The four surfaces define what to watch. The five-layer stack defines how to run it. The 2×2 grid defines where the program is and what comes next. The four-question diagnostic translates the grid into the conversation that needs to happen before an agent ships.

The detection question isn’t whether to build this capability. Production AI is shipping on a steady cadence across mid-market and enterprise, and the existing security stack was never designed for the workload class arriving in it. The question is where the program sits on the grid today, and what gets built next quarter.

ARMO’s platform for cloud-native security for AI workloads was built to occupy the framework end-to-end — runtime telemetry through eBPF, Deployment-level baselines through Application Profile DNA, cross-surface correlation through CADR, three-tier triage, and per-agent enforcement. Walking the framework against a real environment is the fastest way to see where the grid puts you. The next surface to instrument follows from there.

FAQ

How long does it take to build per-agent baselines for production AI agents?

Two to four weeks of observation is typical for production agents, depending on operational variety. The baseline needs to absorb normal load patterns, edge cases triggered by unusual but legitimate user requests, and the rhythm of routine deployments. During the convergence window the system runs in visibility-only mode — collecting telemetry without firing alerts — which mirrors the observe-to-enforce workflow the rest of the framework depends on.

Can we add agent attack detection to our existing SIEM rather than building a new stack?

A SIEM sits at Layers 3 and 4 in this framework but does not substitute for Layers 1 and 2. If the source signals from the four surfaces don’t exist, SIEM correlation has nothing to correlate — it ends up joining tool-class alerts (CNAPP, EDR, audit) that were never built to capture agent-specific events. The SIEM stays in place as the enterprise visibility layer; the source instrumentation has to come from somewhere else.

What’s the minimum telemetry to start with at low maturity?

The minimum is eBPF at the kernel level combined with application-layer hooks at the framework SDK layer. eBPF captures the syscall, process, network, and file events that explain what happened. Framework SDK hooks capture the prompts and tool calls that explain why it happened. Either one alone produces alerts without context — Surface 2 and Surface 3 cannot be reliably distinguished without both.

How does this framework handle managed AI services like Bedrock Agents or Vertex AI Agent Builder?

Surface coverage shifts under managed runtimes. Surface 1 becomes partially accessible through the service’s API logs. Surface 3 remains accessible via cloud audit streams. Surfaces 2 and 4 are largely opaque inside managed runtimes — the platform doesn’t expose framework-level tool-call telemetry or orchestrator state. This is why managed agents typically run at lower-blast-radius tasks until that opacity closes. The cloud-specific instrumentation details for AWS, Azure, and GCP appear in the cloud spokes of the AI workload security buyer’s guide.

How does runtime detection avoid blocking legitimate non-deterministic agent behavior?

The signal is sequence, scope, and rate deviation — not individual event match. Per-agent baselines at the Deployment level tolerate non-determinism by design, because the baseline isn’t a list of allowed actions but an envelope around the agent’s normal operational range. During the convergence window the system runs in visibility-only mode, so false positives during learning don’t translate into production blocks. After convergence, alerts fire on patterns that break the envelope, not on actions that look unusual in isolation.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest