The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Mar 17, 2026
You’re forty-five minutes into a vendor demo for AI workload security. The dashboard looks polished—posture scores, misconfiguration findings, vulnerability counts, all tagged with an “AI workload” label that wasn’t there last quarter. You ask the obvious question: “Show me how this detects a prompt injection attack on our production agent.” Long pause. The SE pulls up a generic process anomaly rule. You realize the tool has no concept of what a prompt is, what a tool invocation looks like, or what “normal” means for an agent that behaves differently every time it runs.
This is how most security teams discover the gap between AI security marketing and AI security reality. Your CSPM flags misconfigurations in training jobs. Your image scanner catches CVEs in deployment. But neither watches what your AI agents actually do when they start calling tools, accessing APIs, and handling real user prompts. The tools you already own cover important ground—they just cover the wrong ground for the threats you’re most worried about.
The root cause is not the tool category. It’s observation gaps—threats that exist but remain invisible because you lack the right telemetry at the right lifecycle stage. This article walks through the AI Security Detection Matrix, a framework that maps each stage of the AI lifecycle—training, deployment, and inference—against what declarative-only tools can see versus what runtime-first tools detect. By the end, you’ll know exactly where your coverage gaps are, why inference-stage threats require behavioral monitoring rather than policy checks, and how to build layered coverage that uses each approach where it actually fits.
The AI Security Detection Matrix maps security tool capabilities against the three stages of the AI workload lifecycle: training, deployment, and inference.
When we say a tool can “see” something, we mean it collects observable signals with enough context to act on. When we say it “misses” something, the threat exists but remains invisible without the right telemetry.
| Lifecycle Stage | Declarative-Only Tools | Runtime-First Tools |
| Training | Static config checks, dependency scans | Process and network behavior in training jobs |
| Deployment | Policy enforcement, image scanning | Validates what actually loaded and executed |
| Inference | Limited to pre-defined rules | Real-time behavioral detection |
Each section that follows uses the same structure: what declarative tools see, what they miss, what runtime tools see, and what they miss. This makes it straightforward to compare approaches stage by stage and map your own stack against the matrix.
Training is where you build and tune models. It runs on cloud infrastructure, pulls in large datasets, and depends on many libraries. Declarative tools have real value here—but they cannot see what happens while training jobs actually run.
Declarative tools check the blueprints of your training environment before anything executes. They scan code, templates, and policies for known mistakes: infrastructure-as-code misconfigurations, dependency vulnerabilities with known CVEs, license compliance violations in ML libraries, dataset storage bucket permissions and exposure, and policy violations in training job configurations. These checks fall under IaC scanning, dependency scanning, and policy enforcement. They harden training infrastructure and reduce obvious mistakes before jobs start.
Once a training job starts, many risks show up only as behavior. Declarative tools do not see inside ephemeral containers or training processes as they run.
Consider data poisoning: anomalous data distributions or unexpected label shifts during ingestion require behavioral monitoring to detect. Research shows that manipulating as little as 0.1% of training data is sufficient to compromise a model—but that manipulation happens during data ingestion, not in a static config file. Similarly, ephemeral training jobs may access secrets inappropriately, make outbound connections to suspicious endpoints, or exhibit malicious behavior in training frameworks that only manifests at runtime. These are behavioral problems that require anomaly detection based on live signals, not pre-run templates.
Runtime-first tools focus on what your training workloads actually do. They install sensors—often using eBPF in Kubernetes environments—to watch process, network, and file activity for each job. ARMO’s eBPF-based sensor, for example, operates at the kernel level with 1–2.5% CPU and approximately 1% memory overhead, making it production-safe even for resource-intensive training workloads.
With this visibility, you can see new processes starting inside a training container that do not match the expected command. You can spot a training job that normally talks only to internal storage suddenly making outbound calls to unknown domains. You can detect a job reading sensitive dataset files it never touched in previous runs. Over time, these signals form behavioral baselines. When a new run deviates sharply, you can investigate or block it.
Runtime monitoring does not replace basic hygiene. Runtime-first tools may miss static misconfigurations that never get exercised—like a misconfigured IAM policy attached to a training role that no job actually uses. They may also miss policy drift in code repositories and IaC templates if they are not integrated. This is why you still need posture management and configuration scanning for your training stack. The right approach is not “runtime or declarative,” but how they work together—a principle that becomes even more important at the next stage.
Deployment is where models and agents leave the lab and enter real environments like Kubernetes clusters. Declarative controls are strong here for pre-deploy checks, but they stop at the point where the model starts serving real traffic—and this is precisely where the gap between “what we think we deployed” and “what is really running” opens up.
Declarative tools examine what you are about to deploy: container image vulnerabilities and misconfigurations, insecure API endpoints and missing authentication, overly permissive IAM roles and service accounts, exposed model artifacts and weights, policy violations in Kubernetes manifests, and AI-BOM completeness requirements covering frameworks, models, and dependencies.
An AI-BOM (AI Bill of Materials) is an inventory of all AI components in your workload—frameworks, models, tools, and dependencies. Declarative tools build this from manifests. But here’s the problem: manifests describe what you planned, not what actually loaded into memory. An AI-BOM built from runtime observation captures what the workload actually uses—the models it loads, the RAG sources it connects to, the external tools and APIs it calls—and surfaces discrepancies between declared and actual state.
Once the model is live, declarative tools cannot verify how it behaves under real-world conditions. Runtime behavior validation—confirming that models respond safely to actual prompts and inputs—is invisible. Tool and action drift, where new capabilities emerge through prompt engineering or configuration changes, goes undetected. Dynamic agent routing decisions that only emerge during execution are missed entirely. And actual data flows that differ from documented architectures remain unknown.
An AI agent that has write access to a database might need it for its actual workflow, or that permission might be leftover from a development sprint and represent real exploitable surface. Declarative tools cannot distinguish between these two scenarios because they never watch the workload operate.
Runtime-first tools close that gap between declared and actual state. They track processes, libraries, and network behavior as the system comes up and serves traffic. ARMO’s runtime-derived AI-BOM, for instance, discovers AI frameworks, models, tools, RAG sources, and dependencies based on observed behavior—confirming that versions in memory match what manifests claim, detecting unexpected helper binaries spawned by serving containers, and spotting model-serving pods that start calling internal admin APIs they never used before.
This is runtime validation of your deployment configuration. It helps you catch drift early, before it becomes an incident—and it feeds directly into the posture layer by comparing declared permissions against observed behavior to create a meaningful risk gap analysis.
Runtime visibility cannot tell you everything about your pre-deploy posture. Some misconfigurations are only visible in repos and pipelines—like a dangerous Helm chart stored in a repo but not yet applied. You still need static analysis and configuration compliance to harden deployment before runtime begins.
Inference is where real users, systems, and attackers interact with your models and agents. This is where AI-specific threats live: prompt injection, tool misuse, model extraction, and agent escape. If training and deployment are “building and shipping,” inference is “driving on a busy road.” You may pass all safety inspections, but if you never watch how the car is driven, you will miss the actual accident.
Declarative approaches are structurally limited at inference because they only see static artifacts—code, prompts, policies—not the live exchanges between users, models, and tools.
Prompt injection is a runtime manipulation where an attacker embeds instructions inside seemingly normal input that trick the model into performing unintended actions: calling a privileged API, accessing restricted data, or exfiltrating information through an external tool call. No static rule can predict the downstream outcome because the exploit is contextual, adapting to whatever the model has access to. Model extraction involves behavioral patterns—request rate, sequence, entropy—that only emerge during active abuse. Agent escape and privilege escalation require observing process behavior, syscalls, network connections, and API usage as they happen. Tool and API misuse turns legitimate capabilities into attack vectors where context determines malice, not the tool itself.
These are behavioral patterns in runtime context. They manifest as sequences of actions over time, which declarative tools are not designed to see.
Runtime-first tools watch the whole chain: prompts, agent choices, tool calls, API traffic, and system behavior. Instead of guessing from configs, they observe what actually happens. The difference becomes concrete through three scenarios that regularly appear in production AI environments.
Scenario 1: Prompt injection leading to privileged API call. An attacker submits a crafted prompt to a customer-facing AI agent—embedding instructions that look like context data but actually instruct the model to call an internal admin API with elevated permissions. The injected prompt might reference what appears to be a support ticket ID but contains a payload that manipulates the agent’s reasoning chain. Within seconds, the agent has accessed a billing database it has technical permission to reach but has never queried in normal operation. Declarative tools see nothing—the prompt content is not a configuration violation, the API endpoint is an allowed destination, and the agent’s IAM permissions technically permit the call. Runtime detection catches the anomaly at multiple points: the API call pattern deviates from the agent’s behavioral baseline, the privilege usage is unexpected, and the sequence of prompt-to-action does not match any observed behavior from the past weeks of operation. ARMO’s application-layer monitoring sees not just that a process ran, but what it did, why, and with what data—detecting the manipulation at the L7 traffic level before data leaves the environment.
Scenario 2: Model extraction attempt. An attacker sends a high volume of carefully crafted queries designed to map model behavior. Each individual query looks like a normal inference request—properly authenticated, matching the expected schema, hitting an approved endpoint. Viewed in isolation, there is nothing to flag. But the pattern across hundreds of requests reveals systematic probing: varying inputs designed to extract decision boundaries, response patterns that indicate the model’s training distribution, and entropy signatures that indicate extraction rather than legitimate use. Declarative tools see valid configs and approved request schemas. Runtime detection identifies the pattern through behavioral analysis—unusual request volumes from a single session, entropy anomalies in response sequences, and rate patterns that do not match any legitimate user behavior.
Scenario 3: Agent uses credentials to pivot. A compromised AI agent—whether through prompt injection, supply chain attack, or a malicious skill—accesses credentials stored in its environment and attempts to move laterally. The credentials were properly stored and permissioned according to policy. The agent’s service account legitimately has access to them. From a declarative standpoint, everything is compliant. But at runtime, the behavior is anomalous: the agent is accessing credentials it has never touched in weeks of observed operation, initiating network connections to internal services outside its normal communication pattern, and executing processes that deviate from its established behavioral baseline. eBPF-based syscall monitoring at the kernel level catches container breakout attempts and privilege escalation in real time—providing visibility that operates below the layer where the compromise occurred.
A typical AI agent attack chain moves through five stages: the malicious prompt is submitted, the agent processes it and decides on an action, a tool or API is invoked, a privileged resource is accessed, and data exfiltration is attempted. Declarative tools lose visibility after the first stage—they can tell you “this agent is configured to call tools X and Y” but cannot observe the decision path, the specific invocation context, or the data access patterns. Runtime tools pick up the chain at stage three and carry through: detecting anomalous invocations, unexpected access patterns to databases or secrets, and abnormal egress to external endpoints.
ARMO’s LLM-powered attack story correlation connects these signals into a single narrative—replacing the manual correlation work a SOC engineer would otherwise need to perform across multiple disconnected alert streams. Instead of five separate alerts from five different tools, you get one attack story showing the full chain from prompt to exfiltration attempt.
Now we can put everything together. The full AI Security Detection Matrix shows which threats each approach covers and how significant the gap is at each lifecycle stage.
| Stage | Threat Type | Declarative | Runtime | Coverage Gap |
| Training | Data poisoning | ❌ Cannot observe | ✅ Behavioral detection | Critical |
| Training | Credential misuse | ❌ Static only | ✅ Process monitoring | High |
| Training | Config drift | ✅ Policy checks | ⚠️ Requires integration | Low |
| Deployment | Image vulnerabilities | ✅ Scanning | ⚠️ Validates loaded | Low |
| Deployment | IAM misconfig | ✅ Policy enforcement | ⚠️ Requires integration | Low |
| Deployment | Runtime drift | ❌ Cannot observe | ✅ Continuous validation | High |
| Inference | Prompt injection | ❌ Structurally blind | ✅ Behavioral detection | Critical |
| Inference | Model extraction | ❌ Cannot observe | ✅ Pattern analysis | Critical |
| Inference | Agent escape | ❌ Cannot observe | ✅ Syscall/API monitoring | Critical |
| Inference | Tool misuse | ❌ Cannot observe | ✅ Context-aware detection | Critical |
The pattern is unmistakable: declarative tools cover training and deployment posture effectively, but every inference-stage threat—the threats most specific to AI workloads—requires runtime telemetry. You can use this matrix directly in your own environment: map your existing tools to each cell, highlight the cells marked ❌ for your current coverage, and prioritize inference stage gaps where the impact is highest and coverage is weakest.
Track progress with three metrics: coverage (percentage of inference endpoints with runtime monitoring), detection (mean time to detect inference-stage attacks), and quality (reduction in false positives after behavioral baselining).
Knowing the differences is one thing. Making a buying decision is another. Here’s how to translate the matrix into action.
List every security tool that touches your AI workloads. For each tool, mark which cells in the matrix it covers. Identify where you have gaps—especially at inference. If you’re running a CNAPP like Prisma Cloud or Wiz, you’ll likely find your deployment column well-covered but your inference-stage cells almost entirely blank. If you have Falco or a basic runtime tool, you may have some container-level detection but lack the application-layer visibility needed to distinguish normal tool use from prompt-induced misuse.
This exercise regularly reveals that teams are strong on training and deployment posture and weak on inference runtime detection—which is precisely where AI-specific threats concentrate.
Focus on where runtime gaps hurt you the most. Customer-facing AI agents and models carry the highest inference risk—a successful prompt injection on a customer-facing agent can mean data exfiltration, reputational damage, or regulatory exposure. Internal tools with access to sensitive data require runtime monitoring because a compromised internal agent with database access is a lateral movement opportunity. Training pipelines with external data sources need behavioral controls to catch poisoning attempts early.
By tying matrix gaps to real business systems, you can prioritize runtime controls where they matter most and justify the investment in concrete terms.
Design a stack that uses each approach where it fits best: declarative tools for training and deployment posture, runtime-first tools for inference detection and response, and correlation between layers for full attack chain visibility. The 4-Pillar Evaluation Framework provides the evaluation structure for each vendor—this matrix shows you which lifecycle stages each vendor’s architecture can actually cover.
ARMO’s CADR platform is purpose-built for this layered approach: runtime-derived AI-BOM for observability, behavioral baselines for posture validation, AI-aware detection for inference-stage threats, and progressive observe-to-enforce sandboxing that starts in visibility mode and promotes to enforcement as behavioral baselines mature—so you get security guardrails without blocking engineering velocity.
Plan a 90-day rollout using this scorecard:
| Milestone | Day 30 | Day 60 | Day 90 |
| Coverage | Inference endpoints identified | Runtime monitoring deployed | Behavioral baselines established |
| Detection | Initial alerts triaged | False positive reduction measured | Detection rules tuned |
| Response | Response playbooks drafted | Automated responses tested | Full incident workflow validated |
Can declarative tools detect prompt injection attacks?
No. Prompt injection is a runtime manipulation that produces downstream effects—tool calls, data access, network egress. Declarative tools operate on static configurations and cannot observe these behavioral outcomes.
Do I need both runtime and declarative security for AI workloads?
Yes. The matrix shows each approach has distinct strengths: declarative tools handle training and deployment posture, while runtime tools cover inference-stage threats that declarative tools are structurally blind to.
What signals prove a prompt injection became a security incident?
Downstream behavioral signals: anomalous tool invocation, unexpected API calls with elevated privileges, suspicious file access, or abnormal network egress. These require runtime telemetry to observe.
How does runtime security affect AI workload performance?
Modern eBPF-based approaches operate at the kernel level with minimal overhead—1–2.5% CPU and approximately 1% memory—avoiding the performance impact of traditional agent-based or sidecar approaches.
What is an AI-BOM and why does it matter for security?
An AI Bill of Materials inventories all AI components—frameworks, models, tools, RAG sources, dependencies. A runtime-derived AI-BOM reflects what actually runs in production, not just what manifests declare.
The main message of the AI Security Detection Matrix: tools are not the problem on their own—blind spots are. If your stack cannot see what happens at runtime, especially at inference, it cannot protect you from how AI systems fail in the real world.
Start by mapping your current tools to the matrix. Highlight where declarative-only coverage leaves you exposed. Focus on runtime-first visibility for your highest-impact AI agents and models. And walk through each vendor against the 4-Pillar Evaluation Framework to confirm they can actually deliver at the lifecycle stages that matter most.
See how ARMO detects inference-stage threats your current tools miss →
Your engineering lead is in your office Thursday morning. They want to push an AI...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...
A missing null check in libssh’s SFTP directory listing code lets a malicious server crash...