AI Agents in the Cloud: A Risk Management Framework for Security Leaders
Your risk committee meets Thursday. The agenda has a new item: AI agent risk posture....
May 1, 2026
For six weeks, a mid-size hospital system’s CDS agent issued recommendations biased by a poisoned guideline summary. No detection alert fired. The drift — denial recommendations in cases sharing one specific clinical attribute — traced back to a guideline an outside contributor had quietly reweighted in editorial review.
Every existing detection stack reported green. DLP: no PHI left the cluster. EHR audit log: agent reading and writing within scope. Network egress: normal traffic. CNAPP: compliant posture. Container runtime: normal syscalls. SIEM: nothing to aggregate.
This silence shape is familiar from memory poisoning, one of the four attack chains we have previously broken down — weeks of conditioning, no alerts, then a single execution event the runtime stack catches at the tail. Contamination has the same opening but no tail. The “payload” is the agent doing its authorized job, writing biased recommendations into the EHR. The attack target was the agent’s own output. The network was never the channel.
Other AI agent threats exist in healthcare — model supply chain compromise, agent hijacking, denial of service against the inference endpoint. The two outcomes below are not exhaustive. They are where existing healthcare detection stacks have the largest structural blind spots, and where the OCR clock is least forgiving. Both map to risk categories in the OWASP Top 10 for LLM Applications and the OWASP Agentic AI threats framework.
Outcome 1: PHI exfiltration through allowlisted channels. The agent leaks patient data through destinations the security architecture has already approved. The managed LLM endpoint has an active BAA. The embedding service is internal. The vector store sits inside the trust boundary. Destination-layer tools — DLP, network egress monitoring, CNAPP destination policies — see an allowlisted destination, valid permissions, and semantically transformed embeddings rather than raw PHI. The exfiltration completes inside the approved envelope.
Outcome 2: Contaminated clinical output. No destination at all. The attack target is the agent’s own write-back into the EHR, the orders workflow, or shared clinical knowledge stores subsequent cases retrieve. Every destination-layer tool sees silence. The downstream consequence — clinical decisions made on poisoned data — surfaces weeks later, after the bias has affected case after case.
The structural distinction matters. Outcome 1 has a destination problem: the destination is real but allowlisted, so detection must operate at the agent’s behavior layer. Outcome 2 has no destination at all: detection must operate at the agent’s write-path layer.
| Stack component | Outcome 1: PHI exfiltration | Outcome 2: Output contamination |
|---|---|---|
| DLP | Partial — pattern-matches structured content; misses semantic transformation through embeddings | Blind |
| EHR audit log | Partial — shows access events, not causality | Blind — writes are within authorized scope |
| Network egress monitoring | Partial — flags new destinations; misses allowlisted endpoints | Blind — no egress event to observe |
| CNAPP | Posture-only — sees configuration, not behavior | Blind |
| EDR / container runtime | Process events, no AI context | Blind — agent writes look normal |
| SIEM aggregation | Aggregates partial signals; produces partial narrative | Aggregates nothing |
| Per-agent runtime behavioral detection | Channel-layer signal with per-agent baseline | Write-path content baselining + distribution drift |
Closing both gaps requires a per-agent runtime behavioral layer operating on the agent itself rather than the network it’s attached to.
The runtime detection signal that closes the Outcome 1 gap operates at the agent’s behavior layer — by the time the egress reaches the destination, the existing stack has already lost the chain. Three healthcare-specific baseline dimensions drive the signal.
FHIR resource type access pattern. Every clinical agent has a canonical access profile expressed in FHIR resource types. An ambient scribe reads Encounter, Observation, and DocumentReference and writes DocumentReference. A CDS agent reads Patient, Condition, MedicationStatement, and Observation and writes nothing. A prior auth agent reads Coverage, Claim, and ClaimResponse and writes ClaimResponse. The order in which the agent touches resource types per case is also part of the baseline. A scribe accessing Patient demographics for an encounter that doesn’t require demographic context is a baseline deviation — visible before any data leaves the cluster.
Embedding-profile envelope per case-class. Clinical AI workflows almost universally embed clinical content for retrieval. The embedding profile — token distribution, semantic density, length envelope — varies by case-class but is bounded. An agent embedding broader content than the case requires — the entire patient history rather than the relevant encounter — produces a deviation detectable at the embedding-service boundary, before the embedding reaches the LLM endpoint.
Tool-invocation sequence per case-class. Each agent class has a canonical tool-call sequence per case-class. A prior auth agent invokes the eligibility-check tool, the policy-lookup tool, the medical-necessity check, and the audit-write tool, in roughly that order. Drift in this sequence — an extra retrieval call, a tool invoked outside the canonical case flow, a write event in an agent class with a read-only baseline — is the early signal of RAG reverse-write into clinical knowledge stores and cross-patient context leakage. The tool misuse and API abuse patterns we have catalogued express in this domain as deviations in the case-class tool sequence.
ARMO’s Application Profile DNA baselines all three dimensions per agent class at the Deployment level; CADR cross-layer correlation ties deviations to the prompt, the records returned, and the destination.
| Exfiltration channel | Healthcare-specific instance | What the FHIR-aware baseline catches |
|---|---|---|
| A — Agent response through managed LLM endpoint | Ambient scribe transmits clinical text or embeddings to BAA-covered endpoint | Embedding-profile envelope deviation per encounter type |
| B — Tool-call outbound to embedding services | CDS agent embeds broader context than the case requires; cross-patient information leaks into shared embedding cache | Cross-patient FHIR resource access outside the case-class baseline |
| C — RAG reverse-write into clinical knowledge stores | Prior auth agent induced to write current-case PHI into shared retrieval store | Tool-invocation sequence anomaly: write event in an agent class with a read-only baseline |
| D — Cross-patient context leakage through shared retrieval | Prompt injection in Case X causes agent to retrieve and surface Case Y content | Retrieval-corpus access pattern deviation per case-class |
Outcome 2 is structurally harder, because the agent’s behavior at the destination layer looks identical with or without the attack.
The contamination outcome produces a different shape of silence than Outcome 1. Outcome 1 produces partial signals destination-layer tools see but can’t connect to a narrative. Outcome 2 produces no destination-layer signals at all.
Walk the existing stack against a CDS agent whose recommendations have shifted because its retrieval corpus has been quietly poisoned. DLP sees no PHI in the agent’s writes — there is none. The EHR audit log encodes “did this principal access this resource,” not “is this the right recommendation for this case.” Network egress sees no egress to flag. CNAPP shows compliant posture. Container runtime sees normal syscalls. The SIEM has no signals to aggregate.
The contamination produces write events. Each is individually authorized, individually within scope, individually unremarkable. There is no IOC. There is no anomalous destination. There is a slow distribution shift in agent output that requires both per-agent baseline and cross-signal correlation to surface.
Closing the contamination gap requires two tiers working together: per-agent baseline signals that surface the agent-level anomaly, and cross-signal correlation that ties the anomaly to its upstream cause.
Tier 1: Per-agent baseline signals.
Write-path content baselining operates on the embedding profile of agent writes. The embedding profile of DocumentReference, MedicationRequest, or ClaimResponse writes forms a baseline distribution per case-class. An ambient scribe whose embedding profile shifts away from baseline — without a corresponding model-update event — is the early signal that something is biasing its output.
Recommendation distribution drift operates on the per-case-class output distribution. A CDS agent recommending denial in 18% of cases meeting specific criteria has that 18% as its baseline. A statistically significant shift, decorrelated from any deployment event, is the contamination signal. ARMO’s intent drift detection methodology articulates the deployment-correlation pattern that disambiguates legitimate model evolution from compromise.
Tool-call sequence anomalies in batch agents surface cross-patient leakage and retrieval reordering. A prior auth agent processing Case 47 of a batch normally invokes a specific tool sequence; drift — an extra retrieval call, a write event mid-batch — is the early signal of cross-patient context leakage. Detecting prompt injection in production AI agents covers the upstream trigger; the runtime behavioral signal is what surfaces the consequence.
Tier 2: Cross-signal correlation.
Retrieval-corpus-to-output causal correlation runs against retrieval corpus changes when a Tier 1 signal fires, joining index-time corpus events, query-time retrieval events, and context-assembly-time events into one narrative. The corruption surfaces as a chain: corpus change → retrievals against the changed corpus in N cases → agent outputs in those N cases shifted from baseline. ARMO’s CADR cross-layer correlation fuses both tiers into a single investigation surface. Adversarial techniques in this category map to MITRE ATLAS tactics around model and data poisoning, where detection has to be behavioral rather than signature-based.
A CDS recommendation drift attack via knowledge-base poisoning, walked stage by stage.
T-6w. An outside contributor with editorial access to the internal guidelines store submits a guideline summary update for a specific medication class. The update passes editorial review but contains subtly biased clinical reasoning: a few sentences that reweight the trade-off considerations toward denial in cases sharing one specific attribute. Existing detection stack: silent. The change is editorial content, not configuration; no detection tool is watching the guidelines repository for adversarial edits.
T-6w to T-3w. The CDS agent retrieves the updated guideline in cases meeting the criteria. The recommendation distribution begins shifting. Daily shift is below per-day statistical significance; cumulative shift is below the agent’s daily-evaluation threshold. Existing stack: silent. There is no individual case where the agent did anything wrong — every recommendation is defensible against the (now-poisoned) guideline.
T-3w. The cumulative distribution shift crosses statistical significance for the affected case-class. The Tier 1 signal fires, decorrelated from any deployment event. Tier 2 correlation runs against retrieval-corpus changes. The T-6w guideline update surfaces as the only correlated upstream change.
T-0. The SOC analyst opens the alert. The investigation surface shows the agent identifier, the drift signal, the correlated upstream change (the T-6w guideline update by the specific identity), the affected case range, and the per-case audit trail tying each affected case to the corrupted retrieval.
Time-to-evidence: hours, not weeks. That is not a nice-to-have. The OCR clock makes it a detection-architecture requirement.
The HIPAA Breach Notification Rule, codified at 45 CFR §164.410, requires covered entities to notify HHS within 60 days of the breach’s discovery. The window starts at discovery, not investigation completion. The Office for Civil Rights interprets discovery to include circumstances under which the covered entity should have known through reasonable diligence — so an investigation that takes longer than 60 days to produce patient-record-level causality may itself become evidence of inadequate controls.
Manual log pivoting across DLP, EHR audit, container runtime, and prompt context routinely consumes more than 60 days for AI-mediated incidents. The chain — input prompt → tool invocation → records returned → write event or destination — spans four log sources that don’t share schemas, identifiers, or join keys.
The architectural requirement: produce the patient-record-level chain for any AI agent in the cluster, on demand, as a screen rather than a multi-week query. CADR-level cross-layer correlation produces this surface natively because the correlation runs continuously across the layers, not at investigation time.
Three decisions distinguish a healthcare AI detection architecture that meets the OCR clock from one that meets the slide-deck requirement.
Agent prioritization by PHI exposure × write surface. Not every clinical AI agent carries the same risk weight. Rank baseline rollout by PHI exposure (volume × sensitivity of resources accessed) multiplied by write-surface breadth. Ambient scribes and prior auth agents typically rank highest — high PHI exposure plus action-taking authority that materially affects patient care. The same logic that drives per-agent guardrails applies on the detection side: not every agent earns the same baseline depth.
OCR-clock-driven SLA on time-to-evidence. Set the internal SLA on patient-record-level causality production well inside 60 days. A common target is seven-day evidence-on-screen, leaving the rest of the OCR window for breach determination, legal review, and notification preparation. This SLA dictates which detection signals must be correlated automatically versus reconstructed from logs. Anything in the manual-reconstruction column for AI-mediated incidents fails this SLA at scale.
SOC alert-flow integration when the signal isn’t an IOC. Recommendation distribution drift events don’t look like traditional alerts — no bad IP, no malware hash, no lateral movement signature. The SOC’s alert flow has to accept distribution-shift events as first-class alerts, with deployment correlation as the auto-disambiguation step. Without that integration, runtime AI detection signals get triaged as “model performance issues” and routed to ML engineering, not security. The runtime observability foundation that makes per-agent behavior queryable is the precondition.
The healthcare detection question is changing.
For decades, the question has been: what is leaving the cluster, and is it authorized to leave? Every component of the existing healthcare detection stack — DLP, EHR audit, network egress monitoring, CNAPP, EDR, SIEM — answers some variant of that question. For traditional applications, that’s the right question. For clinical AI agents, half of the threat model has a different question: what is the agent writing, and is it what it should be writing? The contamination outcome lives entirely inside that second question. The existing stack’s silence on contamination is not a coverage gap — it is a category error.
What changes architecturally: per-agent runtime baselines that operate on the agent’s behavior rather than the network it’s attached to; cross-signal correlation that produces the patient-record-level chain natively; and an OCR-clock-driven SLA that disqualifies manual log pivoting. The existing stack keeps doing what it does — DLP, EHR audit, network egress all keep running — but the new layer produces the AI-specific narrative the existing stack was never built to produce.
ARMO’s cloud-native security for AI workloads platform implements this layer through CADR cross-layer correlation, eBPF-based runtime sensors, and Application Profile DNA at the Deployment level. Book a demo to see the contamination-detection signal and the OCR-defensible chain on a real workload.
Does write-path content baselining work for ambient scribes given highly variable encounter content?
Yes, because the baseline is per case-class, not per encounter. Ambient scribes have wide content variation across encounters, but the embedding-profile envelope per case-class — primary care follow-up, specialist consult, ED documentation — is bounded. A primary care follow-up note that produces an embedding profile matching specialist consult content is a baseline deviation regardless of literal content.
How does recommendation distribution drift detection distinguish a legitimate model update from compromise?
Through deployment correlation. Every drift event gets correlated against deployment timestamps for the agent — model updates, policy changes, retrieval corpus updates the operations team has logged. A drift that aligns with a logged deployment is legitimate evolution; a drift that does not align is the contamination signal. Without machine-readable deployment metadata at the same resolution as the drift signal, the drift signal alone produces too many false positives to be operationally useful.
Can a SIEM aggregate the per-agent runtime behavioral signal alongside DLP and EHR audit events?
A SIEM can ingest the per-agent runtime behavioral signal as another event source — the right architecture for organizations standardizing alert workflows on their existing SIEM. What it cannot do is reconstruct the per-agent baseline at SIEM-aggregation time. The baseline has to be computed at the agent layer, in continuous runtime, before the alert reaches the SIEM.
What’s the latency overhead of write-path embedding-profile capture in clinical workflows?
ARMO’s eBPF-based sensor architecture introduces approximately 1–2.5% CPU overhead and roughly 1% memory overhead at the workload level. For real-time CDS recommendations and ambient scribes producing post-encounter notes, that sits well inside the latency budget — the capture happens at the syscall layer, outside the inference-path latency that drives clinical workflow SLAs.
Your risk committee meets Thursday. The agenda has a new item: AI agent risk posture....
Editing IAM policies cannot fix the most common architectural mistake in shipping AI agents on...
The residency evidence GDPR and the EU AI Act now expect lives in the runtime...