Get the latest, first
arrowBlog
Runtime-Informed Posture: What AI Agents Can Do vs What They Actually Do

Runtime-Informed Posture: What AI Agents Can Do vs What They Actually Do

Apr 26, 2026

Shauli Rozen
CEO & Co-founder

Key takeaways

  • Why is runtime-informed posture not a single capability? Runtime-informed AI-SPM operates across three independent disciplines — model and artifact, identity and access, and behavioral. Each has its own instrumentation requirement and its own convergence period. A team can be runtime-informed in one and static in the others, which is the most common real-world configuration.
  • What does the gap between configured and operational posture actually look like? It splits into three structurally distinct types: latent capability (configured permits something the agent never does), hidden effective scope (operational reveals authorization the agent's own immediate config does not enumerate), and in-scope anomaly (configured and operational agree, but the pattern of exercise has shifted). Each requires different evidence to surface and different remediation when found.

A platform engineer pulls the AI-SPM dashboard for an agent that has been running in production six weeks. The static dashboard shows several dozen findings, severity-sorted by configuration weight. The runtime-informed dashboard shows a smaller, prioritized list — but a few of those findings do not appear on the static view at all, and most of the static findings appear demoted to a tier the static view does not have. Same agent. Same window. Same underlying configuration. The prioritization is unrecognizable between the two views.

The question is not which dashboard is right. Both are. They present different artifacts. The static dashboard reads the agent’s configured posture. The runtime-informed dashboard reads two artifacts side by side — configured posture and operational posture — and reports the gap between them as the finding.

The two-posture artifact

The cleanest way to think about runtime-informed AI-SPM is to stop treating static and runtime as competing measurements of the same thing. They are two different artifacts a security team maintains side by side, and a runtime-informed finding is the structured reading of the gap between them.

Configured posture is the union of every configuration artifact applied to the agent — IAM scope, RBAC bindings, NetworkPolicy egress rules, container security context, mounted secrets, admission-controller policies, Service Mesh authorization, and the agent framework’s own configuration declaring which models, MCP servers, RAG indexes, and tool catalogs it intends to use. It is fully specified at deploy time and changes only when configuration changes. A static AI-SPM audit walks this artifact.

Operational posture is the union of every runtime signal observed from the agent — API calls, IAM events authenticated against cloud APIs, tool invocations and their parameters, network egress destinations, file access patterns, syscall behavior, identity hops, the MCP tool catalog as advertised at runtime by each connected MCP server, and models actually loaded into memory. It is built up over time and continues evolving. A runtime sensor produces this artifact.

A runtime-informed AI-SPM practice maintains both. The work is not measuring one or the other — it is reading them against each other. Without configured posture, you cannot see latent capability — the agent’s authorized but unused surface that is still part of its blast radius. Without operational posture, you cannot distinguish exercised from unexercised authorization, and you cannot see effective scope that resolves through inheritance or runtime loading. The reconciliation requires both as first-class outputs of the AI-SPM stack.

For AI agents specifically, operational posture requires an eBPF substrate at the kernel layer — ARMO’s instrumentation runs at roughly 1-2.5% CPU and 1% memory — plus application-layer correlation that links kernel events to the tool calls and prompt-context events that produced them. The kernel layer alone gets you behavioral signal; the application-layer correlation is what makes that signal interpretable as agent operation rather than process activity.

The three gaps runtime-informed AI-SPM surfaces

The reconciliation produces three structurally distinct gap types. Each describes a particular relationship between configured and operational posture; each requires different evidence to identify; each has a different reduction path; each has a specific failure mode when misclassified. The taxonomy applies across all three AI-SPM disciplines. For the identity and access discipline specifically, we have previously walked the three-category operational playbook in detail; what follows generalizes that taxonomy across the cross-discipline view.

Gap typeModel & artifact disciplineIdentity & access disciplineBehavioral discipline
Latent capabilityConfigured > operational. Authorized but unexercised.Framework version installed but never imported into a process; model declared as a dependency, never loaded into memory.Permission granted to the service account but never exercised across the observation window. Classical CIEM territory generalized to AI workloads.NetworkPolicy egress to a CIDR the agent never reaches; tool in the agent’s catalog never invoked; data source connected but never read.
Hidden effective scopeOperational reveals authorization the agent’s own immediate config does not enumerate.MCP server tool catalog has shifted between reconnects; tag-referenced model adapter version diverged from manifest intent; RAG index loaded different content than declared.Inherited binding broadens effective IAM scope; node service account fallback when Workload Identity or IRSA is misconfigured; federated credential chain crosses providers.Cross-namespace traffic permitted by an inherited NetworkPolicy the agent’s own namespace policy does not show; Service Mesh authorization bypassing namespace isolation.
In-scope anomalyConfigured ≈ operational, but the pattern of exercise has shifted.Same models loaded as last week, but loading order or memory residency duration has shifted; same inferences, different call distribution.Permission exercised within authorized scope, but the calling pattern (frequency, sequence, prompt-context association) has shifted.Tool-call sequence with new shape; identity hop authorized but unprecedented; egress to an authorized destination on a new schedule.

Gap 1 — Latent capability

Configured posture permits something. Operational posture shows the agent never does it. The gap is unused capability that is still part of the agent’s blast radius.

The reduction path is configured-side scope-down — replacement policies generated from observed activity. This is what existing CIEM and runtime-informed scope-down tooling handle well; the operational pattern is mature. The failure mode worth flagging is when a Gap 1 fix is applied to a Gap 2 or Gap 3 finding. The configured-side scope-down does not address the actual cause and often breaks the agent’s legitimate work; the finding recurs on the next reconciliation cycle.

Gap 2 — Hidden effective scope

Operational posture reveals authorization paths that are not surfaced in the agent’s own immediate configuration. The agent is not operating outside its authorization — the cloud and Kubernetes enforcement planes deny anything not authorized. What operational posture is showing is that the effective authorization, once inheritance and runtime resolution are accounted for, is broader than the immediately attached configuration suggests.

This is structurally the most overlooked gap because the diagnostic intuition for AI-SPM imports from CIEM, where authorization was assumed to live in the agent’s directly attached policies. That assumption misses three patterns specific to AI agents: dynamically advertised MCP tool catalogs that shift between reconnects without any manifest update; transitive IAM scope from inherited bindings or federated credential chains across providers; and Service Mesh or inherited NetworkPolicy that bypasses what the agent’s own namespace policy reveals.

The reduction path is structurally different from Gap 1. It is not configured-side scope-down on the agent’s own policy — that leaves the upstream binding, MCP server, or inherited policy untouched, and effective scope recovers within hours as the inheritance chain re-propagates. The reduction path requires per-agent identity binding, explicit MCP tool catalog pinning or revalidation on each reconnect, and identity-chain audits that walk every contributing policy across the full graph. The failure mode when Gap 2 is misclassified as Gap 1 is the most expensive of the three — the team edits the policy directly attached to the agent’s service account, feels satisfied, and misses that the effective scope is still broader because of an inherited binding they did not look at.

Gap 3 — In-scope anomaly

Configured posture and operational posture agree on what is authorized and what is being exercised. The pattern of exercise has shifted. The agent is not doing anything new — it is doing what it always did, in a way that is measurably different. In the IAM discipline this is what we have previously called unjustified use: permission exercised within authorized scope, but the calling pattern (frequency, sequence, prompt-context association) has shifted. Identifying it requires cross-layer correlation that links kernel-level API events to application-layer tool calls to the prompt-context that produced them. Without all three signals correlated, the gap is invisible. This is the categorical pattern catalogued by OWASP’s agentic threat taxonomy as excessive agency and by MITRE ATLAS as adversarial tool use through legitimate interfaces.

The reduction path for Gap 3 is never an IAM edit and never a configured-posture change. The configured surface is correct; the agent’s authorization is correct; what is wrong is the operational pattern. The right intervention is at the per-agent enforcement and prompt-context layers. The boundary between Gap 3 and detection sits inside this gap. Posture says the pattern has shifted; detection says the shift is malicious. The same signal stream feeds both. We have written separately about why traditional behavioral baselining breaks down for ephemeral AI workloads, and that piece walks the convergence-impossibility math in detail.

The L2-to-L3 transition, operationally

The AI-SPM maturity model identifies the L2-to-L3 jump — from posture-managed to runtime-informed — as where the most meaningful risk reduction happens. That framing is correct, but it implies a single jump. Operationally it is three independent jumps with different instrumentation requirements, different convergence periods, and different practical difficulty. The model and artifact transition is fastest: a runtime-derived AI-BOM built from eBPF-based memory introspection plus uprobes on the language runtime’s import handlers. Convergence is near-deterministic — components loaded into memory at agent startup are visible from the first inference.

The IAM transition is more nuanced than it appears. CIEM tooling extends to AI workloads cleanly for Gap 1 (declared-vs-used analysis on cloud IAM), and that is what most AI-SPM dashboards deliver as their runtime-informed IAM capability. It does not, by itself, handle Gap 2 (which requires identity-chain tracing) or Gap 3 (which requires cross-layer correlation linking IAM events to tool calls to prompt context). A team that added AI-SPM by extending their CIEM is at L3 for Gap 1 in IAM and at L2 for Gaps 2 and 3 in IAM.

The behavioral transition is hardest. Per-pod baselines never converge for AI agents — pod lifetime is shorter than the convergence period. Per-Deployment baselines work because the Deployment is the persistent identity across pod cycling. Convergence runs 7 to 14 days at the Deployment level for routine workloads, longer for agents with rare-but-legitimate work patterns like monthly reports or quarterly batch jobs. The transition is not a cutover — it is a 7-to-14-day ramp per agent.

The findings queue changes shape when the transition lands. At L2, the queue is severity-sorted by configuration weight, and the analyst investigates which findings actually matter. At L3, the queue is gap-typed by reconciliation outcome, and each finding carries the gap classification, the discipline, and the evidence that reconciled the configured and operational sides. The cadence shifts too — L2 is the periodic audit; L3 is continuous reconciliation. The NIST AI Risk Management Framework names continuous monitoring as a core governance practice for production AI systems for exactly this reason.

The asymmetric-maturity failure mode

The maturity model treats L2-to-L3 as a single dimension. Operationally it is three dimensions, and the failure mode worth naming is what happens when a team makes the three transitions at uneven speeds — which is the natural progression for almost every team that does not deliberately plan for uniform advancement.

The progression has predictable shape. The IAM transition extends most easily because CIEM tooling already exists. The model and artifact transition requires runtime memory introspection that few products do well, so most teams stay at L2 unless they deliberately invest. The behavioral transition is operationally hardest. The natural endpoint, absent deliberate planning, is runtime-informed in IAM, static in the other two.

This is worse than uniform L2 in a specific way. Not in absolute risk reduction — the IAM coverage is genuinely better than what uniform L2 produces. The failure mode is that the team’s perception of their own AI-SPM maturity advances faster than their actual coverage does. The IAM discipline produces a small, gap-typed, prioritized findings queue that looks like mature AI-SPM output. Two-thirds of the AI-SPM territory still produces uncalibrated severity-sorted findings in different dashboards, but the visible “we did AI-SPM” output is the IAM dashboard. The gap between perceived maturity and actual coverage is itself the failure mode. Asymmetric maturity calibrates worse than acknowledged blindness because it does not trigger the conversations that uniform L2 still triggers.

A second-order failure mode is cross-discipline correlation. A worked example: an MCP server’s tool catalog shifts (a Gap 2 in model and artifact, since the MCP endpoint is in config but the catalog is not). The new tool’s data path uses an IAM scope that was already granted to the agent, which on a runtime-informed IAM dashboard reads as Gap 1 closing — a previously latent permission is now in use, which the dashboard logic treats as good news. The behavioral baseline has not converged on the new tool yet because it has been three days since the catalog shift, so Gap 3 is invisible by convergence rather than absence. A team mature in IAM only sees “Gap 1 closing” and treats it as healthy. A team mature across all three sees the cross-discipline correlation: a catalog shift in model and artifact produced an unprecedented permission exercise pattern in IAM that the behavioral discipline cannot yet score. Same evidence, different reads, depending on which discipline is the maturity ceiling.

Uniform L3 is what makes cross-discipline reads visible. The instrumentation that supports it is shared across the disciplines as a single substrate. ARMO’s Application Profile DNA at the Deployment level is the per-agent representation that ties the three signals into a single operational posture artifact. When that artifact is the input to the reconciliation across all three disciplines, the cross-discipline gap reads become possible. When it is the input to one discipline only, they are structurally invisible.

Closing the loop

Runtime-informed AI-SPM in operational practice is the reconciliation discipline. Two artifacts maintained side by side. Three structurally distinct gap types. Three independent maturity transitions, with the asymmetric-maturity failure mode as the predictable trap.

The practical move for a team starting this work: pull the last 10 AI-SPM findings out of your stack. Tag each one Gap 1, Gap 2, Gap 3, or “can’t tell.” The “can’t tell” count is your asymmetric-maturity exposure. A team mature across all three disciplines has zero “can’t tells.” Most teams have a majority. The gap between what your stack produces today and what gap-typed findings would tell you is the work the L2-to-L3 transition exists to do.

ARMO’s platform for cloud-native AI workload security maintains both posture artifacts across all three disciplines on a single eBPF substrate, with the cross-discipline reconciliation that surfaces the gap types this article walks. The platform runs alongside existing CIEM and CSPM rather than replacing it. If you want a walkthrough of how the three-gap taxonomy applies to your environment specifically, book a demo.

Frequently asked questions

How do I know which gap type a specific finding falls into?

Triage by direction. If configured posture permits something the agent never does, it is Gap 1 — standard scope-down territory. If operational posture reveals authorization paths the agent’s own immediate configuration does not enumerate, it is Gap 2 — this requires identity-chain or runtime-load instrumentation, not a policy edit on the agent’s own service account. If configured and operational agree on what is allowed and exercised but the pattern has shifted, it is Gap 3 — this requires per-agent behavioral guardrails, never an IAM edit.

What is the convergence period for operational posture on a new agent?

It is discipline-specific. For model and artifact, convergence is near-deterministic — components loaded into memory at agent startup are visible from the first inference. For IAM and behavioral, convergence runs 7 to 14 days at the Deployment level for routine workloads, longer for agents with rare-but-legitimate work patterns like monthly reports or quarterly batch jobs. Per-pod baselines never converge because pod lifetime is shorter than the convergence period.

Can a single tool cover all three disciplines?

Some can; most cannot. The instrumentation overlap is real — the same eBPF substrate that produces behavioral signal also feeds IAM correlation and runtime-derived AI-BOM. But the analytical work differs significantly across disciplines, and many tools cover one discipline strongly and the others derivatively. The evaluation question is what the tool’s core instrumentation produces, and how much of each discipline that instrumentation covers with real depth.

How does this interact with our compliance evidence requirements?

Operational posture is increasingly the better evidence form for AI workloads under the NIST AI Risk Management Framework, the EU AI Act, and the NIST CSF Profile for AI. Static configuration-only evidence proves what an agent was permitted to do; auditors increasingly want evidence of what the agent actually did. A reconciled posture artifact satisfies both requirements without separate evidence-collection workflows.

What is the relationship between the three gap types and threat detection?

Gap 3 sits closest to detection territory. Posture surfaces the gap — the operational pattern has shifted — and detection investigates whether the shift is benign drift or a compromise indicator. The signal stream feeds both pipelines; what differs is the analytical question. A mature AI-SPM practice uses OWASP’s LLM threat taxonomy and the agentic-application catalog as the standard reference for the threat categories each layer is responsible for.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest