Get the latest, first
arrowBlog
Why Your Detection Latency Budget Determines Blast Radius

Why Your Detection Latency Budget Determines Blast Radius

Jun 1, 2026

Shauli Rozen
CEO & Co-founder

Key takeaways

  • Why does a “real-time” detection stack still let an AI agent attack cause damage? Because end-to-end detection latency is a sum of stage latencies, not a single number, and the slow stage is rarely the sensor on the datasheet. A stack can capture the first malicious syscall in milliseconds and still take minutes to contain the attack, because the time is spent downstream — in correlation, in triage, in the gap before response fires. “Real-time” describes one stage; the incident is governed by all five.
  • How does detection latency translate into blast radius? Blast radius is the integral of attacker progress over total pipeline latency — the area under the curve, not a line you either beat or miss. And the curve is not linear: once an agent reaches credential or tool access, the rate of reachable damage jumps, so detecting one stage later does not cost you one stage of damage. It costs you everything that stage unlocked.

Most teams buy detection on a single number. The datasheet says “millisecond detection,” the proof-of-concept fires the instant a test payload lands, and the box gets checked. Then a real AI agent incident runs in production, and the postmortem shows the attack completed its objective well before anyone contained it, even though the alert, technically, fired in milliseconds.

The number was real. It just measured the wrong thing. Detection latency is not one interval; it is a five-stage budget, and the alert-fires-fast number only covers the first stage. The seconds that decided the incident were spent in stages nobody benchmarked: a correlation engine batching signals, a learning window burying the alert, a human deciding whether to page. 

This article decomposes that budget stage by stage and shows why blast radius compounds over total latency rather than adding to it — the operating model for detecting AI agent attacks across the full cloud stack is where the underlying detection surfaces and layers come from; this piece reads them on the axis of time.

Blast Radius Is an Integral, Not a Threshold

Start at the destination, because it reframes everything upstream. Your real-time stack did not lose the incident because the sensor was slow. It lost because the latency hid in a stage you never put on a benchmark, and damage kept accumulating the entire time that stage was running.

That accumulation is the part most detection conversations skip. Blast radius is not a threshold you either beat or breach — it is the area under the attacker’s progress curve, integrated over the full time from first malicious action to containment. Every second the pipeline is still working, the agent is still acting. What sets that area is not time-to-detect. It is time-to-containment: the moment the agent is actually stopped, which sits at the far end of the pipeline, not the near end where the first alert fires. Detecting the first malicious action early helps only if containment follows quickly — an early detection feeding a slow pipeline still lets the agent work the whole time the pipeline runs.

And the curve is not a straight line. Damage does not accumulate at a constant rate. Early — a poisoned prompt ingested, a tool invoked once — reachable damage is low. Then the agent reaches credential or tool access, and the rate jumps: now it can query a PII table, call an export tool, reach an internal API with inherited permissions. The area under the curve after that inflection is far larger per second than before it. This is why a stage of delay multiplies rather than adds. Detecting at the credential-access stage instead of the tool-invocation stage before it does not cost you one stage of damage — it costs you everything the credential stage unlocked, integrated over however long the rest of the pipeline takes to contain.

A sensor that fires in milliseconds in front of a pipeline that contains in four minutes is not a fast detection stack. It is a four-minute integral with a fast first stage, and the attacker works against the integral, not the datasheet.

Detection Latency Is the Sum of Five Stage Latencies

If blast radius is set by time-to-containment, then “real-time” needs a sharper definition than the datasheet gives it. Define it as the interval between the first malicious action and the containment action — and then decompose that interval, because it is not atomic.

Detection latency is the sum of five stage latencies, in sequence:

  1. Telemetry acquisition — the time from an event occurring to a signal existing about it.
  2. Baseline evaluation — the time to decide whether that signal is normal for this agent.
  3. Cross-layer correlation — the time to assemble related signals into a single attack chain.
  4. Triage classification — the time to decide whether the chain is benign, an attempt, or an active attack.
  5. Response trigger — the time from “this is an active attack” to containment actually firing.

These are the same five layers we have previously described as a five-layer operating stack — read on a different axis. The parent framework describes what each layer produces; this describes what each layer costs in time.

The budget framing matters because total latency is dominated by whichever single stage is slowest. A pipeline with a sub-millisecond first stage and a five-minute third stage is a five-minute pipeline — you do not get credit for the fast stage; the attacker keeps working until the slowest stage clears. Optimizing the stage that is already fast moves the total by a rounding error.

Stage 1 is the stage teams optimize first, for good reason — it is the one the datasheet exposes and the one a proof-of-concept tests. It is also rarely where the budget goes. In-kernel eBPF telemetry produces signals within milliseconds of an event, at 1–2.5% CPU and 1% memory overhead, with no learning window before it is active — the detection works on the first syscall of the first pod in a brand-new deployment. When acquisition is in-kernel, Stage 1 contributes microseconds to the budget. The time is almost always somewhere else.

The Dominant Stage Is the One Not on the Datasheet

Here is the pattern that decides real incidents: every architecture overspends in a different stage, and it is never the one on the datasheet. The sensor is fast because the sensor is what gets benchmarked; the latency relocates to whichever stage nobody measured. Walk the four stages downstream of acquisition and the relocation becomes concrete.

Stage 2, baseline evaluation, overspends through learning-mode delay. A detection approach that depends on per-pod baseline convergence has a structural latency problem in any cluster with rolling deployments. When a pod is twelve minutes old and its baseline is still “learning — insufficient observation data,” every signal it emits carries a low-confidence tag. The real exfiltration alert arrives looking identical to the “new behavior” alerts from nine other new pods the deployment just created, and it goes to the back of the triage queue — by the time it reaches the top, the data is gone and the pod has been recycled. The latency here is not detection time; the signal existed. It is the queue position it was assigned, because the architecture cannot tell a Deployment-level first from a pod-level learning artifact. A baseline anchored at the Deployment level, with history that survives pod churn, removes this stage from the budget.

Stage 3, correlation, overspends through batch windows and context loss. This is where SIEM-centric pipelines bleed the most time. When correlation runs at the SIEM layer, signals arrive having already lost the context that made them correlatable — joined by metadata and timestamp proximity rather than causation, on a batch interval rather than in line with the events. The analyst gets a set of alerts that happened near each other and reconstructs the causal chain by hand. That manual reconstruction is the dominant latency in most enterprise stacks, and it is invisible on the datasheet because the SIEM is procured separately from the sensors. Correlation that runs in line — assembling the chain as signals arrive rather than after they land in a log store — collapses this stage to the time it takes to render the story.

Stage 4, triage classification, overspends through the human handoff. Once a chain is assembled, something has to classify it into the three tiers — benign, attempt, or active. If that decision waits on a human reading the graph off-hours, the stage costs however long it takes to page, wake, and read. The decision itself is fast for a prepared analyst — we have previously collapsed the page-investigate-document call into a three-tier decision framework that resolves in seconds — but the latency is in the handoff, not the judgment. An explainability layer that tiers the chain automatically removes the wake-and-read interval and reserves the human for genuinely irreversible decisions.

Stage 5, response trigger, overspends through manual runbooks. The final stage is the gap between “classified as an active attack” and “containment has fired.” A manual runbook — open the console, find the workload, revoke the token, isolate the pod — spends minutes here while the integral keeps growing. This is distinct from what the containment mechanism is: a stack can have excellent enforcement and still spend five minutes in Stage 5 because a human is driving it. Automated, surface-specific response triggered directly off the Stage-4 classification is what closes this stage.

Four stages, four places the time hides, and not one of them is the sensor. Stage 1 got the dismissal; Stages 2 through 5 are where time-to-containment is won or lost.

Run the Budget Against Your Own Stack

The budget is only useful if you can locate your own dominant stage. The reframe: stop asking “what is our MTTD” and start asking “where in our pipeline is the MTTD.” The first question returns a number that averages away the bottleneck. The second finds it.

Run each stage against one question:

  • Acquisition — where is the signal acquired, in-kernel or from shipped logs? Log shipping adds its own interval before the signal even exists.
  • Evaluation — is there a learning window, and what is the alert confidence during a rolling deployment? Per-pod convergence means perpetual learning mode in any cluster with active scaling.
  • Correlation — is the chain assembled in line as signals arrive, or in batch after they land in a SIEM? Batch correlation is the most common hidden bottleneck.
  • Classification — is the benign/attempt/active decision automated, or does it wait on a human to read the chain?
  • Response — does containment fire automatically off the classification, or does it wait on a manual runbook?

The stage where the honest answer is “slow” is your dominant stage, and it is where the entire budget should go next quarter — not into the stage that already benchmarks well. Two stacks that both advertise millisecond detection can differ enormously in time-to-containment — one spends its budget in batch correlation and a manual runbook; the other does not. The datasheet number will not tell you which stack you have. The per-stage audit will.

This is the architecture argument for collapsing the middle of the pipeline. In-line runtime correlation assembles the application, container, Kubernetes, and cloud signals into a single attack story as they arrive — folding Stages 3 and 4 into one rendered narrative rather than a batch job followed by a manual read. The enforcement that closes Stage 5 is the surface-specific containment side of the same platform, turning observed behavior into per-agent response through the observe-to-enforce approach so the response stage fires automatically rather than waiting on a console. The point is not that any one capability is fast. It is that the budget is spent across the whole pipeline, and the stacks that contain quickly are the ones that refused to leave a slow stage in the middle.

Budget the Latency Before the Incident Does

Real-time detection is not a property you buy at the sensor. It is a budget earned across the whole pipeline, and an attacker collects on whichever stage you underfunded. A millisecond first stage in front of a batch correlation engine and a manual runbook is not a fast stack — it is a slow integral with a fast opening move.

For the architect, the takeaway is an audit, not a number. Decompose detection latency into its five stages, find where time actually accumulates, and put the next investment there. The dominant stage is where time-to-containment is won or lost, and it is almost never the one that looked good in the proof-of-concept. The fastest way to find yours is to walk the five stages against a live environment and watch where the seconds go — cloud-native security for AI workloads is built to occupy the budget end to end, from in-kernel acquisition through in-line correlation to automated, per-agent response. The next stage to fund follows from where the time is.

FAQ

How do I measure detection latency per stage instead of as one number?

Instrument a timestamp at each handoff in the pipeline: event occurrence, signal creation, chain assembly, classification decision, and containment action. The deltas between those timestamps are your five stage latencies, and the first audit usually reveals that the dominant stage is not the one the team assumed. Most teams have never placed these markers, which is exactly why they only have a single aggregate MTTD.

What is a realistic time-to-containment target for AI agent attacks on Kubernetes?

Frame the target against attacker progress, not an absolute service-level number. The useful question is whether containment fires before the agent reaches the inflection where reachable-damage rate jumps — typically credential or tool access — because that is where the integral starts compounding. “Contained before credential access is exercised” is a more meaningful target than “contained in under N minutes,” because it ties the budget to where damage actually accelerates.

Does a faster sensor reduce blast radius if my correlation is batched?

No. Total latency is dominated by the slowest stage, so if correlation runs on a batch interval, a faster sensor moves the total by a rounding error. The signal sits waiting for the next correlation window regardless of how fast it was acquired, and the attacker keeps working through that wait — spend the budget on the batched stage, not the one that already clears in milliseconds.

How does learning-mode delay add latency if detection is “always on”?

The engine is running, but during a baseline’s learning window every signal from a new pod is tagged low-confidence, which sets its position in the triage queue rather than removing it from monitoring. In a cluster with rolling deployments, the real alert lands among dozens of legitimate “new behavior” alerts and queues behind them, so the latency is the queue wait, not the detection. A baseline anchored at the Deployment level rather than the pod removes the learning window for already-profiled agents and eliminates this stage from the budget.

Where does human triage belong if I’m optimizing for latency?

Automate the classification of the assembled chain into benign, attempt, or active, and reserve human judgment for actions that are genuinely irreversible. The wake-read-decide interval is pure latency when the decision can be made from a pre-assembled, explainable attack story, so the human handoff should sit after automated containment of the clear-cut active attacks, not before it. Keep people in the loop for the high-consequence, ambiguous calls; take them out for the active attacks the classifier can already name.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest