Prompt and Tool Call Visibility: What Your AI Agents Are Actually Doing
It is 11:47 p.m. and the on-call security engineer is staring at two dashboards. On...
Apr 22, 2026
Your platform team deployed eBPF-based runtime sensors on AKS last week. Defender for Containers is enabled. Azure Policy is enforcing pod security standards across your AI workload namespaces. And your Observe pillar is still blind — because nobody enabled the Diagnostic Setting that routes kube-audit logs to the Log Analytics workspace where your tooling can actually consume them.
It’s not a misconfiguration. It’s a handoff failure. The platform team that manages your AKS cluster assumed audit logging was “on” because the cluster is running. The SecOps team that manages the Log Analytics workspace assumed they were receiving logs because the workspace exists. Neither team checked the Diagnostic Setting that connects the two. This is the Azure implementation problem: not missing tools, but missing handoffs between tools owned by different teams.
The Observe → Posture → Detect → Enforce framework is cloud-agnostic. The parent guide covers the methodology and the dependency chain between stages. This article covers how that framework wires into AKS specifically — which Azure services contribute to which pillar, which Defender plans you need licensed, and which handoffs between services typically break. For teams running on AWS, the EKS implementation guide covers the same framework against AWS-native primitives. For cross-cloud environments, the multi-cloud framework addresses the coordination problems that span providers.
Before deploying anything new, map what AKS already gives you. The table below organizes Azure services by framework pillar, identifies the Defender plan required where applicable, names the team that typically owns the configuration, and flags the handoff failure mode — the specific step that breaks in most “fully configured” environments.
| Pillar | Azure Service | Defender Plan | Owning Team | What It Contributes | Where It Stops | Handoff Failure Mode |
| Observe | Diagnostic Settings (kube-audit), Container Insights, Defender CSPM | CSPM (for AI-BOM) | Platform team + SecOps | Control-plane audit trail, container metrics, AI workload inventory | Logs the request, not the consequence. No behavioral profiling. | Diagnostic Setting not enabled. Audit logs never reach the workspace. |
| Posture | Defender CSPM, Azure Policy + Gatekeeper, Workload Identity scopes | CSPM | SecOps + IAM team | Posture findings, admission-time governance, declared permission scopes | Surfaces theoretical risk. Cannot compare declared vs. observed behavior. | Azure Policy initiatives don’t cover AI-specific pod configurations. IAM team doesn’t know which scopes agents actually exercise. |
| Detect | Defender for Containers, Defender for AI Services, Sentinel analytics rules | Containers + AI Services | SecOps + AI engineering | Known-threat signatures, prompt-level inspection, alert correlation | Separate correlation domains. Cannot trace prompt to agent behavior to data access as a causal chain. | Defender for AI Services not licensed. Sentinel data connectors not wired for runtime sources. |
| Enforce | Azure Policy, NetworkPolicy (Azure CNI Cilium), Workload Identity scoping | N/A (native K8s + Azure) | Platform team + IAM team | Admission-time governance, network segmentation, identity boundaries | Static rules only. Cannot adapt to non-deterministic agent behavior at runtime. | NetworkPolicies written from assumptions, not observed traffic. Workload Identity scopes never tightened post-deployment. |
The pattern across every row is the same: Azure services handle identity, audit, and infrastructure-level detection well. The gap is behavioral — understanding what AI agents actually do at runtime, distinguishing normal tool invocations from malicious ones, and deriving enforcement policies from observed behavior rather than assumptions. That gap is where the runtime behavioral layer enters at each pillar.
AKS control-plane logging is opt-in. The kube-audit and kube-audit-admin log categories require an explicit Diagnostic Setting to route events to a Log Analytics workspace, Event Hub, or Storage Account. Unlike some platforms where control-plane logging is a single toggle, AKS separates the cluster (owned by the platform team) from the log destination (owned by SecOps or the observability team). If these are different teams — and they usually are — the handoff between them is where Phase 1 observation gaps originate.
Verify: navigate to your AKS cluster in the Azure portal, open Diagnostic settings, and confirm that kube-audit (or kube-audit-admin for higher-volume logging) is routing to a workspace your security tooling can query. If this setting is missing, your Observe pillar has a foundational gap that no amount of sensor deployment can close.
Deploy ARMO’s eBPF sensor as a DaemonSet on AKS user node pools hosting AI workloads. The sensor auto-discovers AI agents, inference servers, and MCP tool runtimes running in the cluster. Within the first 48 hours, it generates a runtime AI-BOM — an inventory built from observed execution, not deployment manifests. This catches components that static scanning misses: the adapter model your agent downloads from Hugging Face at startup, the MCP server connection a developer added last week, the Python package that loads a transitive dependency nobody audited.
Container Insights gives you CPU, memory, and network metrics plus stdout/stderr logs. Useful for operational health, but it cannot attribute network connections to specific tool invocations, cannot build behavioral profiles per agent, and cannot distinguish legitimate from anomalous tool calls. Think of Container Insights as the operational baseline and the eBPF sensor as the security baseline — they answer different questions and both feed the Observe pillar.
Workload Identity Federation on AKS uses OIDC federated credentials with subject-claim filters — a structurally different mechanism from AWS’s IRSA or STS-based assumed roles. The declared-vs-observed comparison happens against federated credential configurations: which Azure AD app registrations are bound, which subject claims are permitted, which audience values are set. Your agent’s federated credential may permit access to Key Vault, Storage, Cosmos DB, and Azure OpenAI Service. Phase 1 observation data shows it actually calls two of those four services in normal operation.
That gap — declared permissions versus observed behavior — is the posture finding that matters. ARMO’s Application Profile DNA captures this per agent, creating a behavioral profile that includes which Azure APIs are called, the call patterns, frequencies, and data access volumes that constitute “normal” for each workload. The profile is the artifact the Posture pillar produces and the Detect pillar consumes.
Defender CSPM surfaces posture findings — overprivileged identities, exposed endpoints, vulnerable AI framework versions, and attack path analysis showing how weak links connect into broader risk. These findings are valuable, but without runtime context, most are theoretical. Runtime behavioral data overlaid on CSPM findings reveals which represent actual risk (permissions actively exercised, code paths that execute, workloads that are externally reachable) versus theoretical risk (permissions that exist but are never used). This is how runtime-informed posture management reduces noise — surfacing only findings that represent actual risk in your environment.
Azure Policy enforces admission-time constraints through Gatekeeper — blocking privileged containers, requiring resource limits on AI workload pods, enforcing labels, mandating network policies. It governs what can be deployed. It cannot observe or restrict what happens inside a running container. The Posture pillar uses Azure Policy for the declarative baseline and runtime behavioral data for the observed-behavior overlay.
Defender for Containers deploys a DaemonSet sensor on AKS nodes collecting Kubernetes events, process telemetry, and network data. It also provides agentless capabilities — vulnerability assessment of container images and Kubernetes control-plane monitoring — that add posture-level value. But for the Detect pillar specifically, neither the agent-based nor agentless components build behavioral baselines specific to your AI workloads. Defender for Containers detects suspicious process execution, anomalous API calls, and known threat signatures — reverse shells, crypto miners, credential exfiltration to known malicious IPs. It cannot answer “is this tool invocation normal for this agent?” because it has no concept of per-agent behavioral profiles.
Defender for AI Services inspects prompts and model responses at the Azure OpenAI API boundary through Content Safety Prompt Shields. It covers direct jailbreak attempts, indirect prompt injection, data leakage patterns, and credential theft signals. Alerts route into Defender XDR and Microsoft Sentinel for centralized investigation. This is genuine AI-specific detection capability that AWS has no equivalent for at the API boundary.
Where it stops: Defender for AI Services operates at the content plane — the API call into Azure OpenAI. It does not see what the agent does with the model’s response inside the cluster. If an indirect prompt injection succeeds and the agent begins making anomalous tool calls, spawning unexpected processes, or accessing data outside its normal pattern, Defender for AI Services has no visibility into that behavioral cascade. The OWASP Top 10 for LLM Applications ranks prompt injection as the number-one risk precisely because what happens after a successful injection is where the real damage occurs.
For the Detect pillar, this means Defender for AI Services covers one threat category: prompt-level attacks at the API gateway. Agent escape, tool misuse, and behavioral drift need runtime behavioral detection operating inside the cluster at the kernel level. For the full attack-chain walkthrough showing exactly where Defender’s correlation stops during a multi-stage AI agent attack on AKS, see the Azure evaluation in the buyer’s guide series.
The runtime behavioral layer addresses the three threat categories Defender doesn’t cover for AI agents. Application Profile DNA builds per-agent behavioral baselines from observed execution — which tools each agent calls, which APIs it reaches, which network destinations it contacts, which system calls are normal. eBPF-level process lineage tracking connects prompts to tool calls to process execution to data access into a single causal chain. CADR’s cross-layer correlation turns what would otherwise be scattered alerts across Defender plans into unified attack stories your SOC can act on in minutes.
On EKS, the runtime layer coexists alongside GuardDuty as a peer detector — both generate alerts, both matter, and the SOC triages from two sources. On AKS, the operational model is different. The runtime layer’s primary value is feeding Sentinel as a data connector so the SOC sees Defender plan alerts and runtime attack stories in the same incident queue. ARMO’s pre-correlated attack stories arrive in Sentinel as structured narratives with process-level causality already established. The Sentinel incident now contains Defender’s prompt-level alert, the full causal chain from the runtime layer, and whatever additional context your KQL analytics rules add — all in one view. The SOC isn’t replacing Sentinel. They’re feeding it better data.
Enforcement is where most AI agent security programs stall. You know you should restrict agent permissions, but writing policies for workloads whose behavior changes with every prompt feels like guessing. The framework solves this by making enforcement the output of observation and baselining, not the starting point. On AKS, enforcement maps to four concrete mechanisms.
Azure Policy initiative definitions enforce pod security standards, network policy requirements, and resource limits at admission time through Gatekeeper. This is the declarative enforcement layer — it prevents insecure configurations from being deployed. It cannot constrain what happens inside a running container. Use Azure Policy to set the floor; use runtime enforcement to govern what agents do above that floor.
AKS with Azure CNI Powered by Cilium supports CiliumNetworkPolicy resources for L7 filtering, giving you HTTP-method-level and path-level granularity on egress from AI agent namespaces. Auto-generated NetworkPolicies from observed agent traffic patterns make network segmentation practical for non-deterministic workloads: if the agent only reaches three Azure services and one internal API in normal operation, the policy allows exactly those destinations and denies everything else. The observation-mode-first workflow from the progressive enforcement methodology means you generate policies from evidence, not guesswork.
Microsoft released the Agent Governance Toolkit in April 2026 as an open-source sidecar-based policy engine for runtime agent governance. It deploys alongside your agents on AKS and enforces policy decisions at the application layer — constraining what the agent is allowed to decide. This is complementary to kernel-level eBPF enforcement, which constrains what happens when those decisions execute. The Governance Toolkit addresses application-layer policy (tool access decisions, data routing rules). eBPF enforcement addresses infrastructure-layer behavior (syscalls, network connections, file access, process spawning). Two enforcement layers, two threat categories, no conflicts.
Workload Identity Federation per agent, IMDS blocking via NetworkPolicy, API server access restriction, and RBAC minimization are the infrastructure controls that close the enforcement gaps specific to AKS. The AKS sandboxing guide provides the per-control deep dive with YAML and Azure CLI commands — including the three approaches to IMDS blocking and their trade-offs. This section stays at the framework-wiring level: enforcement policies generated from behavioral evidence collected in Phases 1 and 2, deployed in audit mode first, validated against observed behavior, then graduated to enforcement after the platform team confirms zero production impact.
ARMO’s Cloud Application Detection & Response (CADR) platform, built on Kubescape, provides the integrated runtime layer that connects each framework phase to your AKS environment. For teams evaluating AI workload security across cloud platforms, the AI workload security platform overview covers the full capability set.
Phase 1 (Observe): Deploy ARMO’s eBPF sensor to AKS clusters via Helm. The sensor auto-discovers AI agents, inference servers, and MCP tool runtimes. Runtime AI-BOM generates within hours. Agent-to-tool-to-data-source interaction maps appear in the ARMO console alongside the Diagnostic Settings data you’re already collecting.
Phase 2 (Posture + Detect): Application Profile DNA builds behavioral baselines for each agent workload — capturing syscalls, network patterns, file access, Kubernetes API usage, and tool invocation sequences. Detection rules grounded in those baselines cut false positives by surfacing only genuine deviations. CADR correlates signals across cloud and cluster layers for full attack stories that route into Sentinel alongside Defender alerts.
Phase 3–4 (Enforce): Auto-generated seccomp profiles, network policies, and identity constraints deploy in audit mode first. The platform monitors for false positives, lets you adjust, then graduates to enforcement. Per-agent granularity means your high-risk autonomous agent gets stricter controls than your read-only chatbot. All without writing policies from scratch.
The quantified outcomes from this workflow: 90%+ CVE noise reduction through runtime reachability analysis, 90%+ faster investigation through LLM-powered attack story generation, 80%+ reduction in issue overload through runtime-based prioritization. All at 1–2.5% CPU and 1% memory overhead. Kubernetes-native, eBPF-based, no sidecars, no code changes.
To see how the full framework maps to your AKS environment: book a demo today.
Defender for Containers deploys an eBPF-based sensor that detects known threat signatures — reverse shells, crypto miners, credential exfiltration. ARMO builds behavioral baselines per agent workload and detects deviations from observed normal behavior, including AI-specific threats like anomalous tool invocation patterns and prompt-driven behavioral shifts. Defender catches known-bad. ARMO catches abnormal-for-this-agent. Both are valuable, and they’re complementary — run both.
Not all at once. Minimum viable: Defender for Containers (Detect pillar) plus Defender CSPM (Observe and Posture pillars). For full coverage, add Defender for AI Services if your agents call Azure OpenAI, Defender for Key Vault if agents access secrets, and Defender for Storage if agents interact with blob data. The Pillar Ownership Map above shows which plans contribute to which phases.
Defender for AI Services only covers Azure AI services (Azure OpenAI and Azure AI Model Inference). Agents calling self-hosted models, external APIs, or open-source LLMs running on AKS nodes need runtime behavioral detection regardless. The runtime layer’s behavioral baselines and detection work independently of which model provider the agent calls — they observe what happens inside the cluster at the kernel level.
The framework is the same. The wiring is structurally different in five ways: Azure fragments security capabilities across more than a dozen separately-licensed Defender plans (AWS has GuardDuty as one SKU); AKS routes audit logs through Diagnostic Settings that aren’t enabled by default; Sentinel is the SIEM integration target (not a peer detector); Azure Policy with Gatekeeper is the admission-time enforcement substrate; and Workload Identity Federation uses OIDC federated credentials with subject-claim filters rather than STS-style assumed roles. The ownership map — which team configures which service for which pillar — is the axis specific to Azure’s more-fragmented service architecture.
It is 11:47 p.m. and the on-call security engineer is staring at two dashboards. On...
The external auditor’s evidence request lands Tuesday morning. A security architect at a Tier 1...
A platform team at a mid-size SaaS company runs three LangChain agents and one AutoGPT-derived...