Get the latest, first
arrowBlog
The AI Agent Attack Kill Chain: Which Stages You Can Actually Detect

The AI Agent Attack Kill Chain: Which Stages You Can Actually Detect

May 30, 2026

Shauli Rozen
CEO & Co-founder

Key takeaways

  • Can you detect every stage of an AI agent attack? No. Detectability across the lifecycle is uneven, and the earliest stages produce no runtime signal at all. The instinct inherited from the classic kill chain — break the earliest link, because the earlier you intervene the less damage is done — fails for agent attacks, because the early links are silent.
  • Where does the agent kill chain actually become visible? At the first stage that executes — a tool call, a process spawn, an identity used, an outbound connection. That is rarely the first stage that happens. Recon, poisoning, and intent hijack all precede execution, and all of them are invisible at runtime.
  • What replaces “break the earliest link” for agent attacks? Correlation across the silent stages. You catch the chain at the first stage that emits a signal, then reconstruct backward through the stages that emitted nothing — turning a single visible event into the full attack story.

The early stages of an AI agent attack are silent. The poisoning, the hijacked intent, the reconnaissance: none of it executes, so none of it produces a runtime signal, and the kill-chain instinct every security team runs on says exactly the wrong thing here: break the earliest link. There is no early link to break. You cannot detect a stage that emits nothing.

That is not a knock on the doctrine in general. For traditional intrusions, reconnaissance and delivery leave logs, scans, and payloads a defender can catch, and breaking the chain early is sound advice. Agent attacks break the assumption underneath it, and that makes “where in the lifecycle can I actually see this” the question that matters, the one the conceptual kill-chain models, attacker-objective stages from recon through impact, never answer.

This article re-cuts the agent attack kill chain by a single axis: what a defender can observe at runtime. The canonical lifecycle runs through seven stages: reconnaissance, ingestion and poisoning, intent hijack, in-scope reconnaissance, privilege and tool escalation, lateral movement and action, and exfiltration or impact. Hold that sequence as the reference. What follows is where each stage breaks, what containment fits it, and why most of the early chain is dark.

The Early Stages of an Agent Attack Emit Nothing

Start where the inherited doctrine sends you and you will find an empty room. The first three stages of an agent attack — reconnaissance against the system, ingestion of a poisoned input, hijack of the agent’s intent — share a single property: nothing has executed. The malicious instruction is data at rest — sitting in a context window, a retrieved document, or a vector index, waiting to be processed. There is no syscall to capture, no network connection to flag, no API call to inspect. The agent has not yet done anything that a runtime sensor exists to observe.

The doctrine is not wrong in general — for traditional intrusions, reconnaissance and delivery leave logs, scans, and payloads a defender can catch. Agent attacks break the assumption underneath it. This is what makes the doctrine fail: you cannot break a link you cannot see. “Deny early” assumes the early stages produce evidence that detection can act on, and for agent attacks they do not. The defensible move inverts the instinct: instead of trying to catch the chain at its silent front, you catch it at the first stage that emits, then correlate backward across the silence.

Memory and retrieval poisoning is the clearest case. An attacker plants instructions in a source the agent will later retrieve — a wiki page, a document repository, a vector store entry. From that moment, the chain is live. But there are zero alerts during the days or weeks that follow, because retrieval has not happened and execution has not started. Input filters do not help here either: the content is not flowing through the agent’s prompt at injection time, it is sitting in a store the agent will read from later, after the filter has already passed it. The detection stack stays silent until a single downstream event fires, with no context about the conditioning that preceded it. The full picture of how this and three other chains stay dark through their early stages is in ARMO’s breakdown of four AI attack chains most security stacks miss.

None of this depends on how the attack got in. Direct prompt injection, indirect injection through a retrieved document, a compromised tool in the supply chain, a malicious MCP server — these are different front doors, and the security industry tends to write a separate playbook for each. But once the attacker is inside, every vector converges on the same runtime progression. The agent reasons, calls tools, uses its identity, and moves toward an objective. The front door changes; the chain doesn’t. That convergence is what makes a single canonical lifecycle the right abstraction, and it is why ARMO’s work on AI agent escape detection shows the same stage progression holding across exploit types that look nothing alike at the point of entry.

Map the Chain by What You Can See, Not by Stage Order

The inherited kill chain draws every stage the same way: a box in a row, each one a place to detect and deny. That picture is wrong for agent attacks, because the boxes are not equivalent. Some emit a signal a properly instrumented stack will catch. Some emit nothing. Some are invisible as isolated events and become legible only when joined to others. Stage order tells you the sequence; it tells you nothing about where you can intervene.

Re-cut the chain by detectability and every stage falls into one of three states. Emits — the stage produces a runtime signal: a process spawn, a tool invocation, an identity assertion, an egress connection. Silent — the stage produces nothing observable, because it happens in data or in the model’s reasoning rather than in execution. Resolves only in correlation — the stage produces a signal so ordinary in isolation that it reads as benign, and reveals itself only when tied to the stages around it. The Kill Chain Observability Map lays the canonical stages against these states, with the break point and the containment that fits each one.

StageDetection stateWhere it breaksFitting containment
ReconnaissanceSilentNo runtime evidence; probing looks like normal useNone at runtime — minimize information exposure upstream
Ingestion / poisoningSilentMalicious content is data at rest in context or indexIndex isolation and source re-vetting once correlated
Intent hijackSilentHappens inside model reasoning; no executed actionNone directly — caught at the first downstream emit
In-scope reconnaissanceResolves in correlationAuthorized reads and calls; benign individuallyHeightened monitoring on the implicated agent identity
Privilege / tool escalationEmitsNew tool, new scope, new identity assertionTool-scope revocation; permission revocation
Lateral movement / actionEmitsProcess spawns, new destinations, off-baseline callsPer-agent quarantine; network egress restriction
Exfiltration / impactEmits (often resolves in correlation)Allowed channel, allowed destination, abnormal shapeEgress restriction; kill the workload, preserve state

Reading the emitting rows is where detection gets engineered. The execution stages — escalation, lateral movement, exfiltration — are the points where the agent does something a sensor can record. But recording the event is not the same as knowing it is an attack. The agent is usually acting within its permissions, so the detection question is not “is this action allowed” but “is this normal for this agent.” That question is answered against a behavioral envelope: the scope of tools the agent calls, the sequence it calls them in, and the rate at which it does. ARMO builds this envelope per agent as Application Profile DNA, a Deployment-level baseline of the agent’s normal operational range, and surfaces deviations from it rather than scoring events one at a time.

This is why single-event alerting fails even on the rows that emit. A file read is authorized. An outbound HTTPS connection to an allowlisted endpoint is authorized. A database query is authorized. Each event clears every check on its own, and the attack lives in the combination — the scope that widened, the sequence that broke pattern, the rate that spiked. ARMO’s analysis of tool misuse and API abuse at runtime walks the case where every individual call is legitimate and only the shape of the sequence reveals the compromise.

The containment column is the part the conceptual frameworks never reach, and it is stage-specific by necessity. An escalation through a tool the agent should not be using is contained by revoking that tool’s scope. A lateral-movement stage is contained by quarantining the agent or cutting its egress to unknown destinations. A poisoned index, once correlation has identified it, is contained by isolating the index and re-vetting its sources. Matching the containment to the stage is what turns the Map from a diagram into a runbook.

The Chain Becomes Legible Only When You Correlate Across the Silence

The silent stages do not disappear from the attack. They disappear from the telemetry. The poisoning still happened, the intent still shifted — there is simply no event to show for it at the moment it occurred. What recovers them is correlation: when an emitting stage fires, joining it backward to what preceded it reconstructs the stages that left no signal of their own.

That is the mechanism that makes the inverted doctrine work in practice. The emit point is the entry into the chain, not the start of it. An escalation event fires; tied backward, it resolves into the tool call that triggered it, the retrieved document that carried the instruction, and the indexing event days earlier that planted it. None of the early stages were detectable in isolation. All of them become legible once the visible stage gives correlation a thread to pull.

The alternative is the default: signals land as separate alerts on separate dashboards, and an analyst reconstructs the chain by hand at incident time. Correlation moves that work from the analyst to the architecture. Instead of fragments scattered across cloud, Kubernetes, and application layers, the stack assembles one attack story — a single timeline with the entities, sequence, and impact already joined. That assembly is what cuts investigation and triage time by over 90%: the correlation work an analyst would otherwise do by hand across tools is done at detection time instead. ARMO’s CADR engine was built to occupy this layer, correlating signals across the full stack into one narrative.

Detectability is the attacker-side view of the same problem ARMO’s framework addresses from the defender side. The framework for AI agent attack detection organizes the capability around four detection surfaces and a five-layer operating stack — where attacks cross into the runtime, and how a team turns those crossings into action. The kill chain is the complement: the lifecycle the surfaces are watching for and the layers are built to catch.

Detectability, Not Stage Order, Is What Your Program Should Map

The recurring claim that the kill chain is obsolete for AI agents gets the diagnosis half right. The sequence model is fine. What breaks is the assumption layered on top of it — that every stage is an evenly spaced chokepoint where a defender can stand. For agent attacks the chokepoints are not evenly spaced. Most of the early chain is silent, the middle resolves only in correlation, and the reliable signal concentrates at the execution stages near the end.

Re-cut by detectability, the chain stops being a row of equal links and becomes a map of where to instrument and where to correlate. The program question is no longer “do we have a detection for every stage,” which is unanswerable because some stages cannot be detected directly. It becomes “do we catch the first stage that emits, and can we correlate backward through the stages that don’t.” A program built on that question does three things. It instruments the emitting stages deeply. It accepts that the silent ones are recovered through correlation, not direct detection. And it stops spending budget watching the front of the chain for evidence that will never appear.

The fastest way to see where a program stands is to walk the Kill Chain Observability Map against a real environment and mark which rows are dark. ARMO’s platform for cloud-native security for AI workloads was built to light up the emitting rows with runtime telemetry and tie them backward across the silent ones through correlation. The rows that stay dark after that exercise are the program’s real gaps — and they are almost never the ones the inherited doctrine told the team to watch.

FAQ

Which AI agent attack stage should I instrument first?

The first stage that emits a runtime signal in your environment, which means tool-call and process-execution telemetry — not the earliest stage in the chain. The early stages are silent, so instrumenting them yields nothing; the execution stages are where detection has something to record. Start there and build correlation backward from those signals.

How do I detect a stage that produces no runtime signal?

You don’t detect it directly. A silent stage — a poisoned index, a hijacked intent — leaves no event at the moment it happens. You recover it by correlating backward from the first emitting stage that follows it, which gives you the thread to reconstruct what preceded it.

Does the cyber kill chain still apply to AI agents?

As a sequence model, yes — agent attacks do progress through recognizable stages. What does not carry over is the assumption that every stage is an equally good place to intervene. Detectability is uneven across the chain, so detectability, not stage order, is the axis a detection program should map against.

What containment fits a tool-misuse stage versus an escape stage?

Tool misuse is contained by revoking the scope of the specific tool being abused, since the agent is using an authorized capability outside its normal pattern. An escape stage is contained by revoking the agent’s permissions or quarantining the agent entirely, because the agent is breaking out of its intended boundary rather than misusing a capability inside it.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest