The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Apr 1, 2026
Your security team has done the work. On EKS, you’ve deployed GuardDuty with SageMaker coverage, configured IAM roles for service accounts, and built behavioral baselines for your agent workloads. On AKS, you’ve enabled Defender for AI, set up workload identity, and tuned your detection rules. On GKE, you’ve configured Workload Identity Federation and enabled Security Command Center for your Vertex AI integrations. Three clouds. Three implementations. Three sets of baselines, alerts, and enforcement policies.
And then a prompt injection attack on your EKS-hosted agent triggers a cross-cloud data exfiltration through an AKS service endpoint. You get three separate, low-severity alerts in three separate consoles that nobody connects into a single attack story.
This is the coordination problem that separates multi-cloud AI agent security from single-cloud implementations. It’s not about making each cloud’s setup good enough in isolation. It’s about what happens between them — the behavioral baseline gaps, the identity fragmentation, the detection blind spots, and the policy drift that emerge specifically because the agent’s behavior spans providers.
The Observe → Posture → Detect → Enforce framework works regardless of platform. But each stage faces a qualitatively different challenge when it has to operate across EKS, AKS, and GKE simultaneously. Running the same methodology on each cloud independently isn’t multi-cloud security — it’s three single-cloud programs that happen to coexist. The gaps between them are where attacks succeed.
This article maps what breaks at each framework stage in multi-cloud, walks through a cross-cloud attack chain to make the problem concrete, and shows how to build one unified AI agent security program instead of three disconnected ones.
The AI agent security framework uses a dependency chain: observation data feeds posture assessment, behavioral baselines feed detection, and confirmed threats feed enforcement. In single-cloud environments, every stage draws from a complete picture of agent behavior within that provider’s boundary. Multi-cloud breaks that completeness. Each stage operates on a partial view — and the partial views don’t automatically stitch together.
Each provider’s native discovery tools — GuardDuty for SageMaker on AWS, Defender for AI on Azure, Security Command Center for Vertex AI on Google Cloud — only discover workloads within their own environment. Shadow AI that spans providers doesn’t register in either inventory. An agent framework deployed on EKS that calls an inference endpoint on GKE isn’t classified as an AI workload by either provider, because the cross-cloud interaction falls outside both discovery boundaries.
The Gravitee State of AI Agent Security 2026 report found that 47% of enterprise AI agents operate without any security oversight. In multi-cloud environments, that number is almost certainly higher, because agents that interact across providers slip through both providers’ discovery mechanisms. The AI-BOM generated per-cloud is incomplete by definition — it captures what each provider can see within its own boundary, not the full cross-cloud dependency graph that represents the agent’s actual runtime behavior.
What’s needed is a single, provider-agnostic discovery layer that identifies AI agents regardless of where they run and maps their dependencies across cloud boundaries. That’s the foundation that every subsequent stage depends on — and it’s the foundation that per-cloud native tooling structurally cannot provide.
Posture assessment compares declared permissions against observed behavior. The gap between what an agent can do and what it actually does is where runtime-informed posture management finds the real risks. In multi-cloud, each provider’s posture tool runs that comparison only within its own boundary.
IRSA roles on EKS are assessed against EKS-observed behavior. Managed identity scopes on AKS are assessed against AKS-observed behavior. But the agent’s actual permission chain spans both — an EKS agent assuming a role, calling an AKS endpoint through identity federation, and accessing resources that neither provider’s posture tool evaluates in the context of the full chain. You’re comparing permissions against behavior, but you’re only seeing half the behavior at each boundary.
A deployment manifest might declare access to 47 APIs across two providers. The agent uses 3 on EKS and 2 on AKS in normal operation. Without cross-cloud observability showing which 5, you’re either leaving all 47 open or guessing which to restrict — and you’re guessing separately on each cloud, with no way to validate that the two assessments are consistent.
This is where the operational cost hits hardest. A multi-stage attack — prompt injection on EKS, tool misuse through an AKS service endpoint, data exfiltration via GKE storage — produces alerts in three separate detection systems with different alert formats, different severity scoring, and different investigation workflows.
The SOC analyst who picks up the EKS alert doesn’t see the AKS and GKE signals. The AKS alert goes to a different queue, scored as informational because the access was technically authorized through identity federation. The GKE alert flags an external data transfer but lacks the upstream context to classify it as exfiltration rather than legitimate output. Without cross-cloud signal correlation, a coordinated attack looks like three unrelated low-priority events.
Within a single cloud, attack story correlation works because all signals come from the same provider’s telemetry pipeline. Multi-cloud breaks that pipeline. The attack “story” exists across three providers, and nobody’s native tooling is telling it. The investigation that should take minutes — reviewing a single correlated narrative — takes hours of manually pulling logs from three consoles and mentally reconstructing the chain.
A single security intent — “this agent should only call its designated API endpoints” — requires three separate policy implementations. On EKS, that’s a seccomp profile plus a Kubernetes NetworkPolicy plus an IRSA boundary. On AKS, it’s Azure Policy plus a managed identity scope plus an AKS network policy. On GKE, it’s VPC Service Controls plus a Workload Identity binding plus a GKE network policy. Different policy languages. Different enforcement mechanisms. Different update cycles.
Over time, these drift independently. A policy update on EKS — perhaps tightening a seccomp profile after observing new behavioral patterns — doesn’t automatically propagate to the AKS or GKE equivalents. Progressive enforcement, which promotes observed baselines into production-safe policies, has to happen per-cloud. Without a unifying enforcement layer, each cloud’s enforcement state diverges from the others. Six months in, you have three implementations that started identical and now enforce materially different security boundaries for the same logical agent.
Abstract coordination problems become concrete when you trace an actual attack. This scenario illustrates what three separate detection systems see versus what a unified runtime layer sees — and why the difference determines whether your SOC responds in minutes or days.
An AI agent running on EKS receives a manipulated context document through its RAG pipeline — a form of indirect prompt injection. The injected instruction triggers the agent to invoke an MCP tool that calls an AKS-hosted data service via federated identity. The AKS service returns sensitive customer records that the agent was never intended to access in normal operation. The agent then uses a GKE-hosted tool to write those records to an external endpoint controlled by the attacker.
EKS: An unusual prompt pattern, if prompt monitoring is enabled. Possibly a GuardDuty finding for an uncommon outbound API call. Severity: low or informational. The call was made using valid IRSA credentials, so it doesn’t trigger an identity-based alert.
AKS: A data service accessed by a federated identity from an external provider. Defender may flag the cross-tenant access pattern, but the access is technically authorized — the identity federation was configured intentionally. Severity: informational.
GKE: Outbound data transfer to an external endpoint. Security Command Center might flag the destination as unrecognized. Severity: medium — but only because of the external endpoint, not because the system understands the upstream context.
Three alerts. Three consoles. Three investigation workflows. The EKS analyst doesn’t know about the AKS access. The AKS analyst doesn’t know about the GKE exfiltration. The GKE analyst sees data leaving but doesn’t know where it came from or why.
A single attack story: prompt injection on EKS → cross-cloud MCP tool invocation → unauthorized data access on AKS → exfiltration via GKE. All correlated into one narrative with the full behavioral context. Which agent was targeted. Which prompt triggered it. Which tool was misused. Which identity chain was leveraged across providers. What data was accessed. Where it went. One high-severity incident with a complete chain of custody, instead of three low-to-medium events scattered across consoles.
The difference isn’t incremental. The three-alert version gets triaged over days as each cloud team investigates independently. The single-story version triggers immediate response with enough context to contain the exfiltration, revoke the compromised identity chain, and trace the injection point. That’s the operational gap that cross-layer signal correlation closes — and it’s the gap that widens with every cloud provider you add to your environment.
The attack walkthrough makes the problem visceral. The question is what to do about it. Four architectural principles separate a unified multi-cloud program from three disconnected ones.
The core requirement is a runtime observation and enforcement layer that operates identically regardless of cloud provider. eBPF-based kernel-level monitoring meets this requirement because it sits below the cloud abstraction — the Linux kernel interface is the same on EKS worker nodes, AKS nodes, and GKE nodes. Behavioral baselines, detection rules, and enforcement policies are expressed in a common format and produce comparable telemetry across providers.
This is structurally different from trying to normalize outputs from three provider-specific tools. GuardDuty, Defender, and SCC use different data models, different severity scales, and different detection methodologies. Normalization after the fact is lossy — you’re translating between incompatible formats rather than collecting from a single source. A cloud-native runtime layer built on eBPF produces the same behavioral data on every provider: the same Application Profile format, the same AI-BOM structure, the same enforcement primitives. That’s the cross-cloud constant that eliminates behavioral baseline fragmentation at the source.
Behavioral baselines for multi-cloud agents must capture the full cross-cloud behavior graph, not per-provider slices. When an agent’s “normal” includes calling services on both EKS and AKS, the baseline must reflect that cross-cloud interaction pattern as a single behavioral profile. An anomaly is only detectable when you can see the full pattern and identify what’s changed.
This is where the runtime observability architecture matters most. The same sensor that discovers AI agents on each cluster also builds behavioral profiles that span cluster boundaries. An agent’s execution graph — which tools it calls, which APIs it reaches, which network destinations it contacts, which data it accesses — is mapped across providers in a single profile rather than fragmented into per-cloud slices. That unified view is what makes cross-cloud anomaly detection possible rather than theoretical.
Signals from multiple providers must feed into a single correlation engine that constructs multi-cloud attack narratives. This requires three components working together: normalized telemetry in a common format across providers, identity correlation that maps IRSA identities to managed identities to Workload Identity bindings, and temporal correlation that assembles events across providers into chronological attack chains.
The ARMO platform’s LLM-powered attack story generation ingests signals from all clusters regardless of provider, correlates identities across cloud boundaries, and constructs unified narratives that span the full attack chain. The 90%+ reduction in investigation time applies specifically to multi-cloud scenarios because the alternative — manually correlating alerts across three separate consoles with incompatible formats — is where investigation time doesn’t just increase linearly. It explodes. Each additional provider multiplies the number of manual correlation steps, the number of consoles to check, and the number of identity translations to perform.
Security teams should express enforcement intent once — “this agent can only call these endpoints and access these data sources” — and have it translated into provider-specific enforcement primitives automatically. The eBPF enforcement layer handles kernel-level behavioral constraints consistently across providers. Cloud-specific IAM enforcement sits above that as the provider-specific translation layer: IRSA boundaries on EKS, managed identity scopes on AKS, and Workload Identity bindings on GKE.
The critical benefit is drift prevention. When the behavioral baseline for an agent updates — because the agent legitimately started using a new tool or calling a new API — the enforcement policy updates propagate across all providers from a single source of truth. Without this, policy updates happen manually per-cloud, and the window between “policy updated on EKS” and “policy updated on AKS” is a window where enforcement is inconsistent. In multi-cloud environments with dozens of agents, those windows accumulate into permanent drift.
Each stage of the Observe → Posture → Detect → Enforce framework requires specific multi-cloud actions that go beyond what any single-cloud implementation addresses. This checklist maps those actions and their outcomes.
| Stage | Multi-Cloud Specific Actions | What It Solves |
| Observe | Deploy provider-agnostic eBPF sensors across all clusters. Generate unified AI-BOM with cross-cloud dependency mapping. Auto-discover shadow AI agents spanning provider boundaries. Map cross-cloud agent execution graphs. | Eliminates per-cloud discovery blind spots. Maps the full agent interaction graph across providers. Catches shadow AI that spans cloud boundaries. |
| Posture | Compare cross-cloud permission chains against observed cross-cloud behavior. Assess identity federation gaps (IRSA ↔ managed identity ↔ Workload Identity). Run supply chain risk scans that follow dependencies across providers. | Catches permission gaps invisible to single-cloud posture tools. Identifies the cross-cloud permission chains agents actually exercise versus what’s declared. |
| Detect | Normalize alert telemetry across providers into a common format. Configure cross-cloud signal correlation rules. Build identity mapping for investigation continuity. Enable LLM-powered attack story generation across provider boundaries. | Turns three separate alert streams into one correlated attack story. Eliminates the “three low-severity events” blind spot that lets coordinated attacks go undetected. |
| Enforce | Express enforcement intent in a provider-agnostic format. Let the runtime layer translate to provider-specific primitives. Monitor for policy drift across providers. Run progressive enforcement in parallel across all clouds from behavioral evidence. | Maintains enforcement consistency across providers. Prevents the independent policy drift that degrades multi-cloud security posture over time. |
The phased implementation timeline from the parent framework applies here with one important modification: Phase 1 (Discovery Sprint, Weeks 1–2) should deploy sensors across all clouds simultaneously rather than sequentially. If you observe EKS for two weeks, then AKS for two weeks, then GKE for two weeks, you’ve lost six weeks of cross-cloud behavioral correlation data. The cross-cloud baseline starts building only when all clusters are instrumented.
Multi-cloud Kubernetes security content already exists and covers real issues: consistent RBAC, cross-cluster network policies, unified identity management, image scanning across registries, encryption in transit between providers. Those are necessary foundations. But AI agents introduce properties that make multi-cloud security qualitatively harder than securing traditional workloads across the same infrastructure.
Non-deterministic behavior. Traditional workloads make the same calls every time, so cross-cloud policies can be written from static configuration analysis. AI agents generate different behavior per interaction based on prompts, context windows, and available tools. Cross-cloud policies for agents require behavioral observation across providers, not static declaration.
Dynamic tool discovery. Agents discover and invoke tools at runtime, especially through MCP connections. A cross-cloud tool invocation that didn’t exist yesterday can appear today because a developer added an MCP server on a different cluster. Static cross-cloud policies can’t anticipate tool usage patterns that the agent itself generates dynamically.
Agent-to-agent chains. Multi-agent orchestration systems can create cross-cloud agent chains where one agent on EKS invokes another on AKS. The orchestration-level behavior is invisible to either provider’s native monitoring. An agent escape that starts on one cloud and propagates through agent-to-agent communication to another cloud produces a cross-cloud attack chain that no single-cloud detection system is designed to catch.
These properties don’t invalidate existing multi-cloud Kubernetes security practices — they layer on top. You still need consistent RBAC and network policies. But you also need the runtime behavioral layer that understands agent-specific behavior patterns across providers, because the Kubernetes primitives alone can’t distinguish a legitimate cross-cloud agent interaction from an attack exploiting the same identity federation path.
Multi-cloud AI agent security requires a unifying runtime layer that sits below the cloud abstraction. Without it, you’re running three separate security programs with three separate behavioral baselines, three separate alert streams, and three separate enforcement states. The gaps between those programs — the cross-cloud interactions that no single provider monitors, the identity chains that span federation boundaries, the attack stories that exist across consoles nobody’s correlating — are where attacks succeed.
The ARMO platform is built around the principle that the runtime layer should be the cross-cloud constant. The same eBPF sensor produces the same behavioral data on EKS, AKS, and GKE. The same Cloud Application Detection & Response engine correlates signals across all providers into unified attack stories. The same progressive enforcement workflow promotes observed baselines into production-safe policies regardless of which cloud the agent runs on. The quantified outcomes — 90%+ CVE noise reduction through runtime reachability, 90%+ faster investigation through attack story correlation, 80%+ reduction in actionable findings — apply across your entire multi-cloud fleet, not per-provider.
Built on Kubescape, one of the most widely adopted open-source Kubernetes security projects, the platform lets you start Phase 1 with free runtime observability across all your clusters before committing to the full platform. Deploy the sensor on EKS, AKS, and GKE. See what’s running. Watch cross-cloud baselines form. Then decide whether to expand into posture, detection, and enforcement based on what the data shows you.
To see how a single runtime layer provides consistent AI agent security across your multi-cloud environment: book a demo or start Phase 1 with Kubescape.
Native tools like GuardDuty, Defender for AI, and Security Command Center are valuable within their own provider boundary. But they don’t correlate signals across providers, can’t build unified behavioral baselines for agents spanning clouds, and use incompatible policy formats. For multi-cloud, you need a provider-agnostic runtime layer underneath the native tools — complementary, not replacement.
Behavioral baselines built per-cloud capture only that provider’s portion of an agent’s behavior. If an agent normally calls services across EKS and AKS, each cloud’s baseline reflects only half the pattern. Cross-cloud anomalies — like a new cross-provider call triggered by prompt injection — appear normal in each isolated view because each system evaluates only what it can see.
Detection correlation. A multi-stage attack spanning providers produces separate alerts in each cloud’s detection system. Without cross-cloud signal correlation, coordinated attacks look like unrelated low-severity events. Teams triage them individually — often over days — instead of recognizing the unified attack chain that should trigger an immediate response.
The methodology stays the same. You still observe before you posture, posture before you detect, detect before you enforce. But each stage requires a cross-cloud dimension: unified discovery across providers, cross-cloud permission chain assessment, multi-provider signal correlation, and intent-based enforcement with provider-specific translation. The dependency chain between stages is the same; the data each stage operates on must span all providers.
AWS uses IRSA or EKS Pod Identity, Azure uses Azure AD Workload Identity, and GKE uses Workload Identity Federation. Each maps Kubernetes service accounts to cloud IAM differently. An agent calling services across providers creates an identity chain that spans all three models, and no single provider’s audit trail captures the full chain from the originating agent to the final resource access.
The phased implementation timeline applies with one key adjustment: deploy sensors across all clouds simultaneously in Phase 1 rather than sequentially. Cross-cloud baselines start building only when all clusters are instrumented. Expect discovery value in weeks 1–2, posture assessment in weeks 3–4, detection tuning in month 2, and progressive enforcement in month 3. eBPF-based sensors run at 1–2.5% CPU and 1% memory overhead, within the performance budget most platform teams already accept.
Your engineering lead is in your office Thursday morning. They want to push an AI...
A platform security engineer gets an alert at 2:14 a.m. One of the LangChain agents...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...