The CISO’s AI Agent Production Approval Checklist: 7 Gates to Clear Before Go-Live
Your engineering lead is in your office Thursday morning. They want to push an AI...
Mar 31, 2026
Last Tuesday, your security architect opened a pull request to add network policies to the payments namespace. The PR sat for six days. Three engineers commented with variations of “how do we know this won’t break checkout?” Nobody could answer. The PR got marked “needs discussion” and moved to a backlog column where it joined the fourteen other security hardening tickets nobody will touch.
This is what policy paralysis looks like in practice. Not a dramatic breach or a compliance failure—just a slow accumulation of risk because every enforcement change feels like a bet between causing an outage and leaving the door open. The JIRA board says “implement microsegmentation.” The Slack threads say “not this sprint.” Most organizations delay or slow development due to Kubernetes security concerns—and the reason is almost always this same deadlock: the team knows what needs to happen but can’t prove it won’t break production.
The core problem is simple: your Kubernetes cluster defaults to allow-all pod-to-pod communication. You know that’s wrong. You have the tools to fix it—Kubernetes network policies, OPA Gatekeeper, Kyverno, seccomp profiles, RBAC scoping. What you don’t have is the behavioral data to write accurate rules. Without knowing which services your workloads actually call, which syscalls they actually make, and which resources their service accounts actually touch, every policy is a guess.
This article walks through the observe-to-enforce methodology—how to watch real workload behavior, turn those observations into least-privilege policies, and roll out enforcement in stages that shrink blast radius without breaking production. Not a new set of tools. A methodology layer that sits on top of whatever policy engine you already use.
P.S If your clusters are running AI agents specifically, the same methodology applies with additional constraints for non-deterministic workloads — that’s covered in the complete guide to AI agent sandboxing and progressive enforcement.
Blast radius is the scope of systems, data, and services an attacker can reach after compromising a single workload. In Kubernetes: if one pod gets owned, how far can the attacker move?
Kubernetes clusters expand blast radius in ways traditional infrastructure didn’t. Pods communicate laterally across namespaces with no default segmentation. Service accounts accumulate permissions nobody audits—93% of organizations have at least one overly privileged service account. Network policies, if they exist at all, cover a fraction of workloads. These are the kinds of Kubernetes security fundamentals that compound into real exposure when left unaddressed.
**Workload-level blast radius** is what a single compromised pod can directly reach: services it calls, secrets it reads, APIs it accesses. Tight network policies and scoped RBAC shrink this. **Cluster-level blast radius** is what an attacker reaches with escalated access—node compromise or cluster-admin credentials—which exposes the control plane, all workloads, and often cloud provider APIs through service account bindings. The MITRE ATT&CK framework for containers maps these escalation paths in detail.
The distinction tells you where to focus. Internet-facing services and workloads with elevated permissions have the widest workload-level blast radius, and their compromise most often enables cluster-level escalation. Those workloads enter the observe-to-enforce loop first.
Kubernetes has a mature policy ecosystem. The question isn’t whether the tools exist—it’s whether they’re the actual bottleneck.
Pod Security Admission (PSA) enforces predefined security profiles—Privileged, Baseline, and Restricted—at the namespace level, as defined by the Kubernetes Pod Security Standards. It replaced Pod Security Policies and handles admission-time controls: preventing privilege escalation, restricting host namespaces, enforcing read-only root filesystems. PSA is effective for broad guardrails but doesn’t address per-workload behavioral enforcement.
OPA Gatekeeper and Kyverno validate or mutate resources at admission time. Gatekeeper uses Rego; Kyverno uses Kubernetes-native YAML. Both are flexible enough to enforce almost any configuration rule—but they enforce rules you write. They don’t observe runtime behavior or suggest policies based on what workloads actually do. For a deeper comparison of how these tools fit alongside runtime-aware platforms, see this breakdown of open-source Kubernetes security tools.
Kubernetes NetworkPolicies control pod-to-pod and pod-to-external traffic at the network layer. Correctly written, they provide microsegmentation. The challenge is knowing what to allow: you need every legitimate connection mapped before you can write a policy that blocks everything else.
All three assume you already know what workloads should be allowed to do. That assumption is where the methodology breaks down.
To write a correct network policy for a payments service, you need to know: which internal services does it call? Which external endpoints does it contact? Does it connect to anything different during monthly batch processing? Does its service account access secrets in other namespaces? Which syscalls does the container actually use versus which the image includes but never invokes?
Static analysis and developer documentation can’t answer these completely. Dependencies shift with every release. Batch jobs hit endpoints that daily traffic doesn’t. The payment service’s README says it talks to three backends, but runtime shows it also calls a metrics collector, an external fraud scoring API, and a cache layer nobody documented.
This is the gap observe-to-enforce fills. Not replacing your policy engine—feeding it the runtime context to generate accurate rules.
Observe-to-enforce inverts the traditional policy workflow. Instead of writing rules from assumptions, you watch actual runtime behavior, generate policies from evidence, and enforce progressively.
The methodology runs in four stages. Each builds on the last.
Deploy eBPF-based sensors across your clusters to build a complete inventory: pods and deployments per namespace, services and ingresses, service accounts and their role bindings, outbound connections to databases and external APIs, inbound connections from users and other services.
Discovery is where teams find their first surprises. The staging namespace with production database credentials. The monitoring daemonset running cluster-admin because someone copied a Helm chart without reviewing its RBAC. The three-year-old CronJob nobody owns that still runs nightly with broad permissions. ARMO’s platform automates this through runtime-driven workload discovery across clusters—including workloads nobody documented.
Move forward when you trust the visibility data and can identify your highest-risk workloads.
A baseline describes what “normal” looks like for each workload: network connections, file access patterns, system calls, and API interactions. Run observation across one to two deployment cycles—long enough to capture regular operations, scaling events, batch jobs, and maintenance windows.
Consider what this reveals for a real workload:
| Attribute | What You Assumed | What Runtime Shows |
| Network connections | Talks to 3 backend services (per docs) | Talks to 5: the 3 backends + metrics collector + external fraud scoring API |
| External endpoints | Payment gateway only | Payment gateway + CDN for static assets + analytics endpoint nobody documented |
| System calls | Standard web server syscalls | Uses 47 of 330+ syscalls; never touches ptrace, mount, or process_vm_writev |
| Secret access | Reads own namespace secrets | Reads own namespace + shared config secret in the platform namespace |
| Service account usage | Namespace-scoped read access | SA has cluster-wide list/watch on pods and services it never exercises |
This is the data your policy engine needs. ARMO calls these baselines “Application Profile DNA”—a runtime representation of every container’s actual behavior that becomes the foundation for policy generation and anomaly detection.
With baselines established, enforce selectively. Focus on internet-facing services, workloads handling sensitive data, and components with elevated permissions first.
Here’s what enforcement looks like for that payment service:
Network policies: Observation shows the service talks to exactly 5 internal services and 3 external endpoints. ARMO’s platform auto-generates Kubernetes network policies allowing those 8 connections and denying everything else. Before: the pod could reach every service in the cluster. After: it reaches 8. An attacker who compromises this pod inherits access to 8 destinations instead of 120.
Seccomp profiles: Runtime profiling shows 47 syscalls in use. You generate a seccomp profile blocking the remaining 280+. An attacker with code execution can’t use ptrace to inspect other processes, can’t use mount to access the host filesystem, can’t spawn capabilities the container was never meant to have. This is eBPF-based enforcement operating at the kernel level without touching application code.
RBAC scoping: Observation reveals the service account has cluster-wide pod list/watch it never exercises. You scope it to the two namespaces it actually accesses. Privilege escalation paths relying on broad SA permissions are now closed.
The critical discipline: apply in audit mode first. ARMO’s platform logs what would have been blocked without actually blocking it. You see violations, tune for legitimate traffic you missed (month-end reconciliation, quarterly compliance scans), then move to enforcement once audit logs are clean.
This is where progressive enforcement earns its name. You’re not choosing between “open” and “locked down.” You’re building confidence one workload at a time, with evidence at every step.
At maturity, observe-to-enforce becomes the default. Every workload runs under evidence-based policies. New deployments automatically enter the observe-then-enforce cycle. Drift detection alerts when behavior deviates—a service connecting to an endpoint it’s never touched, or a container making syscalls outside its baseline.
Because this runs as a continuous feedback loop, policies stay current. The alternative—static YAML someone wrote six months ago—drifts further from reality with every deployment. ARMO’s platform supports this continuous cycle: Application Profile DNA updates as behavior changes, smart remediation shows which policy changes are safe to make without breaking normal operations, and automated response options (alert, soft quarantine, hard quarantine) contain anomalies based on deviation severity.
Here’s what observe-to-enforce changes for a mid-sized environment running 120 microservices across 8 namespaces:
| Metric | Before | After 90 Days |
| Avg. reachable services per pod | 87 of 120 (flat network) | 6.2 (only observed connections) |
| Workloads with enforced network policies | 12 of 120 (10%, manual) | 96 of 120 (80%, runtime-generated) |
| SAs with unused cluster-wide permissions | 34 (28%) | 3 (2.5%, scoped to observed access) |
| Containers without seccomp profiles | 108 of 120 (90%) | 18 of 120 (15%, generated from syscall data) |
| Mean time to contain anomalies | 45 min (manual) | Under 4 min (automated soft quarantine) |
| Policy-caused production incidents | 3 per quarter (guessed policies) | 0 first quarter (audit mode caught false positives) |
The last row matters most. Zero policy-related production incidents—because every enforced policy was validated through observation and audit mode before it blocked anything. That’s what turns “needs discussion” into “approved and merged.”
eBPF (extended Berkeley Packet Filter) is the technology layer that makes runtime observation viable at scale. eBPF programs run inside the Linux kernel, watching events—network packets, system calls, file access, process creation—without modifying application code or injecting sidecars.
Two practical consequences. First, security teams deploy observation and enforcement independently—no coordination with development teams, no application changes. Second, overhead is low enough for continuous production use: ARMO’s eBPF sensor runs at 1–2.5% CPU and 1% memory.
Because the same sensor handles both observation and enforcement, the transition from “watching” to “blocking” is a configuration change, not an architecture change. You’re promoting behavioral data from monitoring mode into enforcement mode.
The observe-to-enforce loop works identically across managed Kubernetes services. Each cloud adds platform-specific controls you can integrate:
AWS EKS: Runtime observation generates both Kubernetes network policies and VPC security group rules. For IRSA workloads, observation shows which AWS APIs each pod actually calls—so you scope IAM permissions to match actual usage.
Azure AKS: Observation drives Azure Network Policy generation and scopes Azure AD Workload Identity to only the resources each service accesses.
Google GKE: Feeds into Binary Authorization and restricts Workload Identity Federation to APIs workloads actually call. GKE Sandbox (gVisor) provides additional isolation; the eBPF enforcement layer provides behavioral enforcement on top. ARMO operates as the cross-cloud constant—same profiling and workflow regardless of provider.
Three metrics tell the story:
Blast radius size: Reachable services and resources per workload. Before-and-after snapshots show measurable reduction—the payment service dropping from 87 reachable services to 8 is a data point for your board report.
Policy coverage: Percentage of workloads under enforced, evidence-based policies. Runtime context also shifts vulnerability management from counts to exploitable risk—instead of “1,000 open CVEs,” you report “20 in executing code paths, 5 with dangerous permissions.” ARMO’s runtime reachability analysis powers this prioritization, routinely eliminating over 90% of CVE noise.
Mean time to containment (MTTC): Time from anomaly detection to containment. Drops from 30–60 minutes (manual) to under 5 minutes with behavioral baselines and automated response. When incidents occur, ARMO connects signals across application, container, Kubernetes, and cloud layers into a full attack story—replacing hours of log correlation with a single narrative of how the attack progressed.
These metrics also map directly to compliance requirements. Frameworks like the CIS Kubernetes Benchmark and the OWASP Kubernetes Security Cheat Sheet recommend least-privilege network policies, restricted seccomp profiles, and scoped RBAC—exactly what observe-to-enforce generates from runtime evidence. ARMO’s platform includes automated compliance checks against CIS, NSA/CISA, SOC2, PCI-DSS, HIPAA, and GDPR frameworks.
Deploy runtime sensors in one to two key clusters. Inventory workloads, namespaces, and service accounts. Identify your top 5–10 high-risk workloads. Share findings with security and platform teams. Goal: answer “what do we have, and what is it doing?”
Observe behavior across one to two deployment cycles for high-risk workloads. Review network flows, file access, and syscall patterns. Generate initial policies and store in version control. Review with application owners. This surfaces the surprises—undocumented dependencies, unused permissions, secrets nobody knew were being accessed.
Apply policies in audit mode. Tune based on violations and feedback—month-end batch jobs, quarterly compliance scans, DR tests that run different code paths. Move stable workloads to enforcement. Start tracking blast radius, coverage, and MTTC. By day 90, critical workloads run under evidence-based policies with a clear expansion path.
The Kubernetes ecosystem doesn’t lack policy tools. OPA Gatekeeper, Kyverno, Pod Security Admission, native network policies, seccomp, RBAC—enforcement mechanisms exist. What’s been missing is the methodology to feed them accurate data.
Static policies drift with every deployment. Manual policy writing doesn’t scale. And leaving clusters open because you fear outages is a risk that compounds silently.
Observe-to-enforce fills the gap by grounding decisions in runtime evidence. You achieve least privilege without guessing, cut off lateral movement, and continuously verify workloads do only what they’re supposed to. The architect’s PR doesn’t sit for six days when the policy was generated from two weeks of observed behavior and validated through audit mode.
See how this works in practice: watch a demo of the ARMO platform.
One to two deployment cycles, typically one to two weeks. The period must capture normal traffic, batch jobs, and maintenance windows. Workloads with more varied behavior (data pipelines, reconciliation jobs) need longer observation than stable API services.
ARMO’s sensor adds 1–2.5% CPU and 1% memory overhead. Starting in audit mode means nothing is blocked during observation—so there’s no availability risk while baselines are built.
CSPM and KSPM identify configuration risks at rest—misconfigurations, permissive roles, missing policies. They tell you what’s wrong with your declared posture. Observe-to-enforce captures actual runtime behavior to generate enforcement policies based on what workloads do, not what they’re configured to do. They’re complementary: posture scanning shows gaps, observe-to-enforce fills them with evidence-based policies. For a deeper look at how these approaches fit together, see runtime-first Kubernetes security tools compared.
Yes. Observe-to-enforce doesn’t replace your policy engine—it feeds it better data. Existing admission-time policies continue enforcing configuration standards. Runtime-generated policies add the behavioral layer: network policies from observed traffic, seccomp profiles from observed syscalls, RBAC scoping from observed resource access.
Continuous observation detects drift and triggers alerts or policy update suggestions. A service calling a new dependency after a legitimate release gets flagged, validated, and the policy updated. An unexpected connection to an unknown endpoint gets flagged with higher severity for investigation.
Your engineering lead is in your office Thursday morning. They want to push an AI...
A platform security engineer gets an alert at 2:14 a.m. One of the LangChain agents...
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...