AI Agents in the Cloud: A Risk Management Framework for Security Leaders
Your risk committee meets Thursday. The agenda has a new item: AI agent risk posture....
May 6, 2026
It’s 2 a.m. and the SOC has a Tier 3 page. A customer-service agent on the production cluster has just wired refund payments to seven addresses outside the approved disbursement list. The runbook is unambiguous: isolate the pod, image the disk, image the memory, root-cause within 48 hours.
Then someone in the war room asks the actual question — why did the agent do it? — and the answer isn’t on the disk. It lives in the prompt the agent received, the documents it retrieved, the sequence of tools it called, and the outputs the model produced. A standard pod image captures none of that. The disk image will tell you what containers ran. It won’t tell you whether the agent was compromised, manipulated, or acting on poisoned context. Those are three different incidents with three different runbooks, and standard cloud-native IR collapses them into one.
This is the structural break. Three properties of agent workloads — non-deterministic behavior, reasoning-trace-as-evidence, and prompt-as-attack-surface — invalidate every IR runbook built for deterministic services. What follows is the playbook for the observe-posture-detect-enforce program: three incident types, six forensic artifacts to preserve before containment, and a six-phase walk through what each NIST IR phase breaks for AI agents — and how to close it.
Standard cloud-native IR treats “agent compromise” as one category. AI agent incidents split into three structurally different types, and getting the type wrong means running the wrong runbook.
The agent reaches outside its sandbox: cross-container file operations, code planted with elevated privileges, capability abuse, host filesystem access through unintended paths. The IOCs look like traditional escape — kernel signals, namespace transitions, capability assertions. Containment family: workload isolation. We have previously walked through agent escape detection at the syscall layer; familiar IR territory, well-mapped to the MITRE ATLAS execution-escape framework.
The agent uses credentials it legitimately holds in a way it was never authorized to use them. A read-only role doing writes. An identity scoped to one bucket egressing to another. An MCP tool meant for query running mutations. The IOCs are auth-plane: scope deltas, identity-event traces, audit-log anomalies. We have previously broken down the tool misuse and API abuse patterns that distinguish this category from runtime escape. Containment family: permission revocation, credential rotation, access-grant audit.
This is the category traditional IR has no runbook for. The agent’s prompts, retrieved context, or tool descriptions were manipulated, and the agent then took authorized actions based on hijacked reasoning. The actions look legitimate because they are legitimate — the role has permission to wire payments. The compromise lives in the input that drove the decision. Patient zero isn’t a malicious binary; it’s a poisoned RAG document, a manipulated MCP tool description, or a prompt injection in production. Lateral movement isn’t network — it’s prompt context propagating to downstream agents. Containment family: corpus and tool-catalog quarantine, prompt-provenance audit, downstream-agent contagion check.
The classification decision drives everything downstream. Get it wrong, and the runbook you run won’t fix what happened.
No single artifact tells you what happened in an agent incident. The disk shows what containers ran. The network capture shows what destinations were contacted. Neither shows the prompt that drove the action — and the prompt is what tells you whether the agent was compromised, manipulated, or acting on poisoned context. Capture all six before containment closes the window:
Standard “kill the pod, image the disk” loses everything after the pod dies. The mechanism that survives a Soft Quarantine is kernel-level capture that keeps running while the agent is severed from external reach but not terminated — the same approach that powers the observe-to-enforce path used for behavioral baselining.
Walking NIST SP 800-61 phase by phase, the failure modes are categorical, not tunable.
Per-event detection sees pieces of an AI agent attack, never the chain. Container runtimes were built for events — a syscall fired, a connection opened, a file changed — but AI agent incidents express themselves as chains where each link is individually authorized and only the chain is suspicious: a prompt, a tool selection, an API call, an outbound connection.
The practice that closes Phase 1: AI-aware detection that joins events on agent identity and prompt context, not timestamp proximity. We have previously broken down four distinct attack chains most security stacks miss — each one only visible once events from the application, behavior, and infrastructure layers are correlated under the same agent execution graph. The output of Phase 1 is not an alert; it’s an execution graph that hands off to Phase 2.
Traditional analysis scopes blast radius by network reachability, host inventory, and identity propagation. AI agent scope is wider on every dimension: which prompts the agent processed during the incident window, which RAG sources fed its context, which tools it could call, which downstream agents it invoked, which identities it assumed.
The practice: a runtime-derived AI Bill of Materials (AI-BOM) as the scope artifact. Static manifests don’t help; the agent’s effective surface in production is whatever it actually loaded, retrieved, and called — not what was declared at deploy time. The classification decision lives here too. By the end of Phase 2, the responder has named which of the three incident types this is. That naming is what lets the right runbook open in Phase 3.
Standard containment is binary: isolate the workload or don’t. AI agent containment is a ladder, and the rung depends on incident type and the cost of stopping the agent.
For the refund agent from the opening — Type 2, privilege escape, live customer-facing — Soft Quarantine is the right rung. Sever external reach, preserve the process, keep kernel-level capture running. The agent stops doing harm; the chain that explains why stays available. For a batch document-processing agent — same incident type, no live workflow — Kill at the kernel level is appropriate; the workflow is restartable and the chain is already on disk via the AI-BOM.
The containment-as-business-event problem is most acute in regulated verticals. We have previously walked CISOs through what HIPAA and FFIEC require under AI workloads — in those environments the containment action itself can be a reportable event, which inverts the standard “isolate first” instinct. The ladder makes the soft option a real option, not a fallback.
Patching is for code with a CVE. AI agent eradication has three different targets depending on incident type, and each is owned by a different team.
Runtime escape: image rebuild plus sandbox tightening, owned by security and platform. Privilege escape: permission audit, credential rotation, access-grant review — the eradication target is in the IAM plane, owned by platform and IAM.
Reasoning compromise: corpus quarantine (pull the poisoned RAG document, audit what else came from that source), prompt-provenance audit (which input channels can write into the context window, which validation gates fired), tool catalog review (was a tool description manipulated, when did it change, who pushed it). AI engineering owns the corpus and tool catalog; platform owns the input channels; security owns the threat model. There’s no version of this Phase 4 that one team finishes alone.
Recovery in standard IR means restoring service to its pre-incident state. For AI agents, the pre-incident behavioral baseline may itself be the problem — if reasoning compromise was the incident type, the baseline absorbed the compromised behavior as normal before the alert fired. Restoring to that baseline restores the breach.
The practice: rebuild the per-agent baseline from a clean staging environment with production-equivalent traffic before re-promoting the agent to enforcement mode. Per-agent guardrails — generated from observed behavior in a controlled environment — make this tractable. Re-entry runs through observe-to-enforce again; nothing snaps back.
Standard post-mortems produce IOCs and a Jira ticket. AI agent post-mortems produce updated behavioral baselines and updated detection signatures. The work that came out of the incident — the chain that the responder assembled, the type classification that drove the runbook, the indicators that fired late — feeds the per-agent Application Profile DNA so the next chain that looks like this one is caught earlier. The post-mortem is classifier training, not after-action documentation.
The deliverable artifact: how each phase’s action differs across the three incident types.
| Phase | Type 1: Runtime Escape | Type 2: Privilege Escape | Type 3: Reasoning Compromise |
|---|---|---|---|
| 1. Detect | Kernel signals (setns, mount, capability) | Auth-plane scope delta vs. baseline | Prompt-context anomaly + chain assembly |
| 2. Analyze | Process lineage, neighbor review | Identity-event trace, delegation graph | Corpus and tool-catalog provenance |
| 3. Contain | Workload isolation | Permission revocation + credential freeze | Soft Quarantine + corpus pin |
| 4. Eradicate | Image rebuild + sandbox tightening | Permission audit + credential rotation | Corpus quarantine + prompt-provenance audit + tool catalog review |
| 5. Recover | Restore from clean image | Re-grant least-privilege scope | Rebuild baseline from clean staging |
| 6. Post-Incident | Sandbox policy update | Access-grant policy update | APD update + corpus governance |
Classification gets sharper with use. Every reasoning-compromise incident sharpens the next reasoning-compromise detection, because the chain you assembled this time is the signature that fires earlier next time. Which runbook fits depends on what Phase 2 returned — and the classifier improves only when SOC, platform, and AI engineering all feed back into it.
ARMO’s cloud-native security platform for AI workloads is built around the forensic-chain preservation requirement — kernel-level capture that survives Soft Quarantine, runtime AI-BOM that scopes by prompt corpus, cross-layer correlation that joins events on agent identity. Book a demo to see what the chain looks like assembled.
Can I use my existing NIST 800-61 runbook for AI agents?
For runtime execution escape — mostly, because the IOCs and containment paths look like traditional escape. For privilege boundary escape — partially; you need to extend IAM-plane analysis to cover delegation chains. For reasoning compromise — no. There’s no equivalent in 800-61 for an attack whose IOCs are prompts and retrieved documents and whose patient zero is a poisoned context source.
How fast must containment happen for reasoning compromise?
Speed of harm and reversibility govern containment timing, not incident type. A reasoning compromise on a payment agent demands sub-minute response because the harm is wired-and-gone. The same incident on a document-summarization agent might wait while you preserve more of the chain. Type drives runbook; criticality drives clock.
Who owns eradication for reasoning compromise?
Cross-team. Security investigates the chain and produces the threat model. AI engineering quarantines the corpus and audits the tool catalog. Platform rotates credentials and tightens input-channel governance. The post-mortem only closes when all three feed back into the per-agent baseline.
How do you scope blast radius across downstream agents?
Through the runtime AI-BOM and the agent execution graph. Static manifests miss multi-agent contagion entirely — the chain only resolves when you traverse delegations and replay the prompt context that crossed each boundary.
What metrics track AI agent IR maturity?
Time to incident classification, forensic-chain completeness rate, and APD-refinement-applied rate. The first measures speed; the second measures evidence; the third measures whether your program closes the loop.
Your risk committee meets Thursday. The agenda has a new item: AI agent risk posture....
Editing IAM policies cannot fix the most common architectural mistake in shipping AI agents on...
The residency evidence GDPR and the EU AI Act now expect lives in the runtime...