Blog

Home
Blog
AI Agent Governance: From Policy Framework to Runtime Enforcement

AI Agent Governance: From Policy Framework to Runtime Enforcement

May 25, 2026

Ben Hirschberg
CTO & Co-founder

Key takeaways

Why do most AI agent governance programs fail to enforce what they publish? The enforcement mechanism most programs adopt — middleware that intercepts an agent’s tool calls inside the agent’s own process — shares a boundary with the workload it’s meant to constrain. The agent (or whoever has injected instructions into it) sits between the policy engine and the verification, so the policy engine cannot independently attest that enforcement held. The trust anchor lives outside the agent’s process, or it doesn’t exist.
What is the Enforceability Ladder and what does it measure? The ladder is a five-rung scoring rubric for governance clauses — Aspirational, Documentary, Attested, Application-plane Enforced, and Two-Plane Verified. Most enterprise programs cluster at rungs 2 and 3, claim rung 4 by virtue of having an application-layer policy engine deployed, and almost never reach rung 5 because rung 5 requires independent runtime observation captured outside the agent’s process boundary.
How does a security leader use the ladder to drive vendor evaluation? Score the program clause by clause to produce a rung distribution that exposes where the program actually lives versus where it claims to live. Each rung gap maps to a specific capability category and a structured demo question to bring into the next vendor evaluation. The decisive question — show me how the platform verifies enforcement held when the agent’s middleware was bypassed — separates governance products from runtime security platforms.

Most enterprise AI agent governance programs publish policies at the bottom three rungs of a runtime enforceability ladder while their architecture diagrams claim rung four. Almost no program reaches rung five, the only rung that produces evidence an auditor cannot dispute.

The mismatch shows up in the audit committee meeting. The CISO walks in with the NIST AI RMF mapping, the AUP, the model cards, and the vendor risk assessments for every third-party API the agents call. The chair asks the question the deck wasn’t built to answer: “How do we know any of this is actually enforced?” The honest answer is unverified.

Not because the security team has been negligent. The policy engine most programs deployed lives inside the agent it’s supposed to police. It shares the agent’s process, memory, and identity. Independent attestation is impossible by construction.

What follows is the rubric to find where your program actually lives: the Enforceability Ladder, a clause-by-clause scoring exercise, and the rung-by-rung questions to bring into your next vendor evaluation.

The Five Rungs of AI Policy Enforceability

Every clause in your governance program lives somewhere on a five-rung ladder. The rung determines what evidence you can produce if a regulator, an auditor, or your own incident response process asks you to demonstrate enforcement. The lower the rung, the more your control depends on the workload reporting honestly about itself.

Rung 1 — Aspirational: stated principles, no runtime signal

“We will use AI responsibly.” “Agents will operate within ethical boundaries.” Statements that name a value but specify no behavior. They populate AI ethics charters and board communications, and they are unenforceable by construction. A reasonable program contains some rung-1 language — it sets the cultural floor — but no clause should only live here.

Rung 2 — Documentary: specific commitments, no verification

“Agents shall not exfiltrate customer PII to unapproved destinations.” “Agent tool invocations shall remain within the scope of their declared purpose.” Specific. Commitable. Auditable in the abstract. But the policy document does not specify how the org would know the commitment was kept or broken. Rung-2 clauses populate most NIST AI RMF mappings and AUP documents. They are necessary inputs to enforcement, not enforcement themselves.

Rung 3 — Attested: configuration claims, point-in-time

“The agent’s service account is scoped to read-only access to the customer database.” “DLP scanning is enabled on the agent’s egress traffic.” A configuration claim attested at deploy time. Rung-3 controls are real — they constrain the configuration surface — but they verify the setting, not the behavior. An agent operating within a correctly-configured role can still misuse the access the role grants. Configuration drift between scans is invisible. Every rung-3 control assumes the configuration at attestation time still holds at runtime.

Rung 4 — Application-plane Enforced: active blocking, but inside the agent’s process

A policy engine intercepts the agent’s tool calls before execution. It evaluates the call against the active policy and blocks anything outside scope. Application-layer policy engines, identity-broker scoped credentials, and LLM gateways that filter responses all live here. Rung-4 enforcement is real — it stops policy-violating actions that flow through the instrumented code path. It is also the rung where most AI agent governance products in the current market terminate.

The limit of rung 4 is architectural. The policy engine runs inside the agent’s process boundary, sharing memory, identity, and execution context with the workload it constrains. If the agent executes generated code outside the instrumented middleware, spawns a child process that opens its own connection, or has its reasoning loop compromised by indirect prompt injection that bypasses the interception points, the policy engine reports enforcement held while it did not. The agent — or whoever has injected instructions into it — sits between the policy and the verification.

Rung 5 — Two-Plane Verified: independent observation closes the loop

Rung 5 keeps rung-4 application-plane enforcement in place and adds a second plane: independent runtime observation captured outside the agent’s process boundary, at a layer the agent cannot influence or selectively disclose. Kernel-level behavioral instrumentation produces this signal — syscalls, network destinations, file access, child processes — and compares it against a per-agent behavioral baseline of what this specific agent does in normal operation. When the application plane reports enforcement, the infrastructure plane independently confirms it held. When the two planes disagree, the gap is the incident.

Most enterprise programs cluster at rungs 2 and 3. Programs with an application-layer policy engine deployed claim rung 4. Almost none reach rung 5 — and the reason is not effort or sophistication. The runtime instrumentation that supports independent verification is recent. The architectural distinction between application-plane and infrastructure-plane wasn’t named in the first wave of AI governance frameworks. Most teams inherited their program structure from a CSPM-era playbook where attestation was the highest available rung. Climbing to rung 5 requires a capability category that has only recently entered the buyer’s evaluation universe.

Score Your Own Governance Program Against the Ladder

The scoring exercise takes a single working session. Pull six representative clauses from your active program — one from your NIST AI RMF mapping, one from your AUP, one from a model card, one from your vendor risk assessment, one from your data classification policy covering agent I/O, one from your incident response playbook for AI-driven events. Score each against the ladder.

The example below works through six common clauses pulled from a typical mid-sized enterprise program. The pattern that emerges is not unusual: published commitments at rungs 1–2, configuration controls at rung 3, and one or two clauses where an application-layer policy engine has been deployed and reaches rung 4 in narrow scope. No clause sits at rung 5, because the instrumentation that supports rung 5 is not in the stack.

Clause	Typical rung	Evidence required at each higher rung
“AI agents shall not exfiltrate customer PII to unapproved destinations.”	Rung 2 — documentary	Rung 3: DLP egress scanning enabled and attested at deploy time. Rung 4: middleware intercepts agent output and blocks PII payloads. Rung 5: independent runtime observation confirms no PII left the pod through any path, including child processes and shared volumes.
“Agent service accounts are scoped to least privilege.”	Rung 3 — attested	Rung 4: per-agent runtime policy enforces the scope at call time. Rung 5: behavioral observation confirms the agent never exercised permissions outside its declared scope across the observation period.
“Agent tool invocations remain within declared purpose.”	Rung 2 — documentary	Rung 3: tool allowlist configured in agent framework. Rung 4: middleware blocks invocations outside the allowlist. Rung 5: per-agent baseline confirms the actual invocation pattern matches declared purpose, including invocations made through paths the middleware doesn’t see.
“Egress destinations from agent namespaces are restricted to approved domains.”	Rung 3 — attested	Rung 4: NetworkPolicy or service mesh enforces the restriction at the pod layer. Rung 5: kernel-level observation confirms no agent process opened a connection to a destination outside the approved set, regardless of how the connection was initiated.
“Multi-agent delegation occurs only between agents in the same trust domain.”	Rung 1 — aspirational	Rung 2: trust domains defined in the architecture document. Rung 3: identity scopes enforced at the orchestrator. Rung 4: middleware blocks cross-domain delegation calls. Rung 5: runtime delegation graph confirms no out-of-domain agent-to-agent traffic occurred.
“Model and prompt template updates follow change management.”	Rung 2 — documentary	Rung 3: change tickets required and audited. Rung 4: signed model registry with deploy-time verification. Rung 5: runtime baseline detects behavioral drift correlated with the model or template change, independent of whether the change went through CM.

Run the exercise across the program and plot the rung distribution. The histogram you produce is the diagnostic — it shows where the program actually lives versus where it claims to live. The gap is the work. Run the exercise and the pattern surfaces logically: rungs assumed through configuration controls drop to the rung below once scrutinized, and the rungs the program needs to reach require capability the stack has not been evaluated for.

What to Demand from Vendors at Each Rung

The scoring exercise earns its keep when it translates into vendor demands. Each rung gap in your program corresponds to a specific capability category and a structured demo question. Bring the questions into the vendor evaluation. The passing and failing answer patterns are the diagnostic.

Rung 2 → 3: configuration attestation. “Show me the platform’s attestation report mapping each control declared in our policy document to the configuration in production.” A passing answer produces an artifact correlating policy clauses to deployed configurations with timestamps and drift detection. A failing answer pulls up a generic posture dashboard with no clause-level traceability.

Rung 3 → 4: application-plane enforcement. “Show me the platform intercepting a real tool call from one of our agents and blocking a policy-violating action in front of me.” A passing answer demonstrates active interception with policy-mapped block decisions. A failing answer shows audit-mode logging with no enforcement, or enforcement that the vendor admits requires the customer to write their own policy engine.

Rung 4 → 5: two-plane verification. “Show me how the platform verifies enforcement held when the agent’s own middleware was bypassed, compromised, or simply didn’t see the action.” This is the decisive question. A passing answer demonstrates independent runtime observation — kernel-level visibility into syscalls, network destinations, file access, and child processes — compared against a per-agent behavioral baseline built through progressive enforcement, producing evidence the agent’s own self-reporting cannot fabricate.

A failing answer falls into one of three patterns. The vendor explains that their policy engine is the enforcement — which places them entirely at rung 4. The vendor offers a SIEM integration as the answer — which is rung-3 evidence at best, not rung-5 verification. Or the vendor demonstrates generic anomaly detection without a per-agent baseline — which generates noise but does not verify policy adherence.

The rung 4 → 5 question separates governance products from runtime security platforms. Both have a place. The buyer’s job is to know which category they’re evaluating and what role it plays in the program. A rung-4 product without rung-5 instrumentation closes part of the gap and creates the impression the rest is closed. That impression is the failure mode the rest of this article was built to surface.

From Policy Document to Runtime Evidence

The CISO walks back into the next audit committee meeting with two artifacts the first meeting could not produce: the rung-scored program with its honest distribution, and the vendor evaluation tied to the specific gaps the stack can’t currently close. The answer to “are we enforcing this?” is no longer a gesture at a NIST mapping. It is the program sits at this rung for these clauses, here is the capability needed to climb, here is the evaluation underway.

Governance ends in runtime evidence, not in policy documents. The rubric is how you find out where yours ends today. ARMO’s runtime platform for AI workloads is built around the infrastructure-plane instrumentation that produces rung-5 evidence — see how the verification layer runs in production.

Frequently Asked Questions

How do I score a clause that mentions a specific configuration setting?

Configuration claims default to rung 3 — attested. The setting is verified at deploy time, but the runtime behavior the setting is supposed to produce is not independently verified. To climb to rung 4, the clause needs an active enforcement mechanism that fires when the agent attempts to operate outside the configured boundary. To climb to rung 5, you need independent observation confirming the enforcement held — including when the configuration drifts, the agent bypasses the configured code path, or the configured control is disabled at runtime by some other process.

Which audit artifacts does a rung 5 verification produce that a rung 4 enforcement cannot?

Rung 4 produces a log of policy decisions made by the agent’s own instrumentation — the same instrumentation that an auditor knows the workload could influence. Rung 5 produces a runtime evidence chain captured outside the agent’s process: the syscalls the agent’s pod actually executed, the network destinations it actually reached, the files it actually touched, the child processes it actually spawned, all correlated to a per-agent behavioral baseline. The two artifacts together — agent-side policy decisions and infrastructure-side behavioral evidence — let the auditor confirm enforcement held even if the agent’s reporting was compromised.

What’s the practical difference between rung 3 attestation and rung 4 enforcement?

Attestation says the control is configured. Enforcement says the control fired and blocked an action. A scanner that verifies the agent’s service account has the correct IAM scope is attestation. A policy engine that intercepts an out-of-scope API call and refuses to issue it is enforcement. Attestation is a snapshot; enforcement is a continuous active control. A program with strong attestation and no enforcement has documentation of intent, not evidence of result.

How long does a program-scoring exercise take?

Scoring six representative clauses takes a single working session — two to four hours with the policy document, the security architect, and someone who knows the deployed configuration. Auditing the entire program across all policy domains typically runs one to two weeks, depending on program size. The output is the rung distribution histogram and a prioritized list of rung gaps mapped to capability requirements.

Where do runtime-derived AI-BOM and per-agent behavioral baselines sit on the ladder?

Both are infrastructure-plane instrumentation supporting rung 5. The runtime-derived AI-BOM produces the inventory of what each agent is actually loading and calling, independent of what the deployment manifest declared. The per-agent baseline produces the behavioral envelope each agent operates within during normal use, independent of what the policy document claimed. Together they are the substrate the two-plane verification compares against.

May 25, 2026

Can Existing CNAPPs Secure AI Agents in Cloud Environments? Where Each Domain Stops

A CNAPP isn’t a single instrument. It bundles five separately-instrumented security domains — CSPM, CWPP,...

Shauli Rozen

CEO & Co-founder

May 23, 2026

Deploying AI Agents to Production Kubernetes: A Security Checklist for Platform Teams

Your platform team already runs a production-readiness review on every workload that ships to Kubernetes....

Ben Hirschberg

CTO & Co-founder

May 23, 2026

How to Threat Model AI Agents in Kubernetes: A Practical Framework

Most threat modeling assumes the attacker has to break something. AI agents change that assumption....

Yossi Ben Naim

VP of Product Management