AI Workload Baseline and Drift Detection: Defining “Normal” Agent Behavior
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...
Apr 10, 2026
Your engineering lead is in your office Thursday morning. They want to push an AI agent to production next Tuesday. It’s a LangChain-based workflow agent, connected through MCP to three internal tools and one external API, with access to a customer database. The framework posters are on the wall. Your team has spent two quarters standing up runtime observability. And sitting in that chair, you still don’t know whether to say yes.
This is the gap every current article on safely deploying AI agents in production walks around. They tell you how to build a program. They explain the threat model. They walk you through the methodology. What none of them hand you is the artifact you actually need in the Thursday meeting: a gate checklist with concrete evidence standards that turns “do we have a program” into “is this specific agent, on this specific day, ready.”
This piece is not an evaluation framework for picking a tool—that’s what the AI workload security buyer’s guide covers. It’s not the program-building methodology for sequencing observability, posture, detection, and enforcement across your whole environment—that’s what the AI agent security framework for cloud environments is for. This is the operational output those frameworks should produce: seven gates, each with a concrete evidence standard, each mapped to capabilities the framework told you to require. A one-page artifact at the end you can bring into tomorrow’s review.
One assumption before we walk the gates: you already have runtime behavioral visibility into your AI workloads. If you don’t, most of the gates below will fail—not because the agent is unsafe, but because you don’t yet have the evidence to judge. That’s a prerequisite, and closing it is the buyer’s-guide conversation, not this one.
The gates are universal, but the evidence bar scales with the agent. A read-only RAG chatbot and an autonomous workflow agent that books meetings, writes to databases, and orchestrates other agents cannot be reviewed against the same standard. Before walking any gate, the CISO needs to know which tier they’re approving.
Five tiers cover most of what a security team will see in production:
| Tier | What it does | Example | Blast radius |
|---|---|---|---|
| 0 | Read-only retrieval | Internal docs search RAG | Data leakage via retrieval surface |
| 1 | Read with summarization, no tool calls | Support chatbot on a knowledge base | Prompt-injected data exfiltration |
| 2 | Tool-calling, read-only external | Agent that queries APIs but doesn’t write | Unauthorized data surface expansion |
| 3 | Tool-calling with write access | CRM-updating agent, database-writing workflow | Data integrity compromise, fraudulent transactions |
| 4 | Autonomous orchestration, multi-agent | Agents that invoke other agents, spawn sub-tasks | Lateral movement, cascading unauthorized actions |
The tier rule that governs every gate below: as the tier increases, each gate’s evidence bar increases. The gates don’t change. The thresholds do. A Tier 1 agent passing Gate 2 on a week of observation is reasonable; a Tier 4 agent passing Gate 2 on a week of observation is not.
One failure mode to flag before moving on: most agents are misclassified on arrival because engineering describes them by intended use rather than by what the agent can reach if its tool set or prompt changes mid-session. An agent described as “read-only” that has write credentials in its environment is not Tier 0—it’s a Tier 3 agent waiting for the right prompt. Runtime discovery surfaces what an agent actually touches, which is the only reliable input into the tier classification. When in doubt, classify up.
The question: Is this agent in our AI-BOM with a classified autonomy tier and a documented owner?
You cannot approve what you haven’t inventoried. But Gate 1 is as much about what it reveals as what it certifies. If this specific agent isn’t in a runtime inventory, the other agents your engineering team has quietly shipped almost certainly aren’t either. The first time this gate runs on any team is usually the day the shadow-AI problem gets quantified.
The standard: the agent appears in a runtime-derived AI Bill of Materials—not a manually maintained wiki page, not a ticket in Jira. The entry includes the models loaded, RAG sources connected, MCP tool runtimes attached, external APIs reachable, autonomy tier, named engineering owner, named security reviewer, and date added. When a developer runs a kubectl apply tomorrow, the inventory updates; the CISO does not.
What fails this gate: the agent is documented only in a Confluence page or a shared spreadsheet where updates rely on self-reporting. Or worse—the agent exists but is not documented anywhere the security team controls. This is the classic shadow-AI failure: a CrewAI agent connected to an internal database for a hackathon demo three months ago, forgotten, still running with production credentials. If you see it on this agent, you are seeing it on others you have not been told about.
The consequence chain is worth making explicit. If Gate 1 fails, Gate 7 cannot fire—you cannot detect that an agent has changed if you have no canonical record of what it was. Every downstream gate inherits that weakness. An AI bill of materials built from runtime discovery rather than developer self-reporting is the only version that clears Gate 1 at scale, because it regenerates the moment the agent’s reachable surface changes rather than waiting for a human to update a page. The evidence standard for this gate is simple: show me the inventory entry, and show me when it was last updated from runtime data, not from a human.
The question: Have we observed this agent long enough, and across enough interaction patterns, to know what normal looks like for it?
Every subsequent gate depends on a baseline. You cannot verify declared-vs-observed access without one. You cannot tune detection without one. You cannot constrain enforcement without one. A CISO who approves an agent without a baseline is approving Gates 3 through 6 on faith.
The standard: a baseline captured in an environment that represents production traffic shape, covering tool invocation sequences, API call patterns, network destinations, process executions, file access, and data access patterns per interaction type. The observation window scales with tier.
Observation windows by tier, stated as floors rather than targets:
The coverage principle that matters more than the calendar: time is a proxy for coverage. When the proxy and the reality diverge, the reality wins. A Tier 3 agent observed for three weeks that only exercised four of seven declared tools has not cleared Gate 2. A Tier 3 agent observed for ten days that exercised all seven tools under production-representative load has. The question to ask at the gate is not “how long did you watch it?” It is “what did you see, and is there anything in the declared tool set or data set that the baseline doesn’t cover?”
The subtler failure mode every Gate 2 will eventually hit: the baseline is not deterministic. Run the same prompt twice and you may see different tool-call sequences. Sampling temperature, context-window state, and model non-determinism mean that “normal” for an agent is a probability distribution over behaviors, not a deterministic fingerprint. A baseline that captures a single path through that distribution will flag legitimate variance as drift and miss actual drift that happens to stay inside the envelope. This is the mental-model failure that container-security instincts import into AI workload review: a microservice’s behavior is bounded by the finite state machine of its code, and you can enumerate its transitions. A tool-calling agent’s action space is unbounded because the next tool call is a function of the prompt, the context window, and the model’s weights—none of which the security team has visibility into. The evidence standard has to include interaction diversity: did the observation capture multiple runs of the same interaction type with non-trivial variance? If every run looks identical, the baseline is too narrow and will not survive contact with production.
What fails this gate: a baseline captured in a staging environment that doesn’t represent production traffic shape, a window that missed tools or data sources that will be reached in production, or the worst version—”we’ll build the baseline once it’s in production.” That last one is the policy-paralysis trap in reverse: deferring the baseline to a moment when rollback is expensive and every downstream gate becomes impossible to answer honestly. One architectural note that matters for the evidence standard: baselines built per-pod fragment across replicas and produce N noisy profiles as the ReplicaSet scales. The syscall patterns don’t stabilize until you aggregate at the Deployment level, which is why baselines that converge there are usable as approval artifacts and baselines that don’t aren’t.
The question: Does the declared permission set match the observed permission use, or are we approving a gap we don’t understand?
This is the most commonly failed gate in the industry, and the reason is procedural rather than technical: nobody’s looking. Engineering files a change request listing the tools, APIs, and data sources the agent needs. Security reviews the declared list. Nobody compares the declared list to the actual baseline from Gate 2. The delta—sometimes a factor of ten, sometimes a factor of fifty—is the real approval artifact, and it is the one almost never produced.
The standard: a side-by-side view. Declared permissions on one side—IAM roles, service accounts, network policies, tool bindings. Observed permissions actually exercised in the Gate 2 baseline on the other. Every delta is explained: the declared permission is dormant and the grant should be reduced, or the permission is exercised but not in the declared list and the declaration is wrong, or the permission is exercised and declared correctly. The deltas with an explanation are what a CISO signs against.
The detail that turns this gate operational: for an agent declared to access 47 APIs, the runtime reachability view should show the three actually exercised, the 44 dormant, and a risk score on the dormant ones based on blast radius if invoked. A CISO approving this agent has three legitimate options: reduce the grant to the three observed, demand written justification for each of the 44, or accept the gap on the record with a dated re-review commitment. What a CISO should not do is approve the 47 without seeing the three.
What fails this gate: the approval packet contains declared permissions only, with no comparison against observed behavior. Or engineering argues the excess declared permissions are for future-proofing—an argument the CISO’s job is to refuse. Future-proofed permissions are dormant attack surface. The agent will not fail closed if a prompt-injection scenario reaches them, and every unused declared permission is a capability the blast radius analysis in Gate 4 has to account for, whether or not engineering intended it to be reachable today.
Every other approval process in the category treats declared permissions as the source of truth. This piece treats the observed set as the source of truth and requires the declared set to justify itself against it. The mechanism that makes this operational is runtime reachability analysis—the same technique that reduces CVE noise by surfacing only vulnerabilities in code paths that actually execute, applied to permissions instead of packages.
The question: If this agent is compromised or acts on a prompt-injected instruction, what is the worst case—and is that worst case within our risk tolerance for its autonomy tier?
Gates 1 through 3 tell you what the agent is and what it does. Gate 4 asks what it could do if it goes wrong. This is the only gate that is primarily a judgment call rather than an evidence check—but the evidence still has to be on the table when the judgment is made.
The standard: a documented worst-case scenario specific to this agent’s tool set and data access, co-signed by engineering and security. The scenario specifies what data could leave, which systems could be written to, which downstream systems could be invoked, and what containment would cost if the scenario fires. Generic phrases like “data exfiltration risk” do not satisfy the gate. A scenario that names the tables, the endpoints, and the dependent services does.
The distinction that matters most at this gate: agent coercion versus agent compromise. Traditional threat modeling asks how an attacker could exploit a vulnerability in the agent. That framing misses the dominant AI-specific risk, which is how an attacker could use the agent’s intended capabilities against its intended purpose without exploiting anything. A CRM-updating agent that writes arbitrary SQL in response to a cleverly constructed prompt is not a vulnerability—it is the product working as designed. Gate 4’s scenario has to cover the coerced case explicitly, because the compromised case is a subset of threats the security team is already prepared for and the coerced case is the one they haven’t internalized yet.
The confused deputy problem at the MCP boundary: when an AI agent invokes an MCP tool, the tool executes with the agent’s credentials, not the end user’s. Any user who can reach the prompt can, in principle, reach anything the agent is authorized to reach—because a successful indirect injection turns the user’s request into the agent’s request. This is why blast radius for agents has to be computed against the agent’s full privilege set, not the calling user’s. A Tier 3 agent with write access to production databases is a Tier 3 agent for every user who can invoke it, regardless of what those users are individually authorized to do in the non-agent paths of your application.
How tier interacts with this gate: a Tier 1 agent with a narrow blast radius passes on a short scenario paragraph. A Tier 4 agent requires a scenario that walks through a full compromise chain, including lateral movement between agents if the architecture supports it. The scenario should assume the attacker reaches every capability the agent has, declared or dormant—which is why Gate 3’s declared-vs-observed reconciliation has to come first.
What fails this gate: a scenario built against declared permissions rather than observed ones, a scenario generic enough that it could apply to any agent in the environment, or no scenario at all. The failure mode that bites hardest is the scenario that ignores dormant capability. If Gate 3 revealed 44 dormant permissions, Gate 4 has to treat those as potentially reachable under prompt injection, not as aspirations. A blast radius analysis that underestimates the reachable capability is a blast radius analysis that produces false confidence.
The question: Is AI-native detection active and tuned to this agent’s baseline—not generic container alerts, and not AI-for-security noise filtering dressed up as AI workload protection?
At approval time, the question is not “do we have detection”—almost every platform can answer yes. The question is what fires when the agent is attacked. When this agent’s prompt gets injected or its tool call gets abused, does the alert that surfaces include the agent identity, the prompt fragment, the tool invoked, and the data touched, or does it fire as an unauthorized network connection that a SOC analyst will have to manually correlate at 2 a.m.?
The standard: the detection stack has active rules tuned to this specific agent’s baseline. Rules have to distinguish the three subclasses of prompt injection that practitioners actually encounter: direct injection, where a user crafts a malicious prompt at the input layer; indirect injection, where the agent ingests a tool response or RAG document containing instructions the attacker planted upstream; and stored injection, where the tainted content lives in a persistent store and fires on subsequent retrieval. Indirect injection is the hardest of the three because the malicious instruction never passes through the user’s input at all—it comes back from a legitimate tool call to what was, at some earlier moment, a legitimate data source. Detection approaches that inspect agent runtime behavior rather than just user input are the only ones that catch it.
Detection categories the stack has to cover: prompt injection across all three subclasses, agent escape attempts, tool misuse and API abuse, and data exfiltration through legitimate output channels. That last one is worth naming precisely: the attack pattern is not the agent contacting an unauthorized destination. It is the agent using an allowed tool to send data to an allowed destination in a shape that looks like normal operation—an email-sending tool emitting a summary that contains the contents of a database query, a webhook invoked with a payload the baseline never saw. Detection that only watches destination IPs will miss this entirely.
Multi-turn attack coverage: a Gate 5 detection that only inspects single prompts will miss attacks where each individual prompt is benign but the sequence is malicious. Long-lived agent conversations accumulate context that an attacker can manipulate across turns, and the rules have to reason about the conversation state, not just the current request.
Agent identity versus user identity: most current IAM systems conflate the two, and most detection stacks inherit that conflation. An agent acting on behalf of a user should be identifiable as both—the agent for behavioral baselines and sandboxing, the user for authorization decisions and audit trails. Detection findings that can’t distinguish “this agent doing something unusual” from “this user triggering an unusual agent behavior” will produce incidents that nobody can triage quickly.
The operational test for this gate: pick a prompt in staging that causes the agent to call an unexpected tool. Run it. What fires? If the answer is “an alert about an unauthorized process” or “nothing,” Gate 5 fails. If the answer is “a correlated incident tagged as prompt injection with the tool call, the prompt fragment, and the agent identity in a single view,” Gate 5 passes. This test does not require months of setup—it requires one test prompt and a willingness to verify that detection works on the specific agent being approved, not in general.
What fails this gate: detection based on container rules or CSPM posture drift, where a successful prompt injection surfaces as a generic network alert indistinguishable from a misconfiguration. The failure mode a CISO will recognize: the SIEM receives three signals—a prompt processed at the application layer, an unauthorized tool invocation at the Kubernetes API, and an outbound connection at the CNI—with three different timestamps, three different severities, and no shared correlation key. The analyst reviewing them at 2 a.m. has no way to know they’re the same incident until the post-mortem. This is the specific reason generic container alerting misses AI-specific threats: the alerts are technically accurate and operationally useless.
The mechanism that produces the attack-story output is full-stack correlation across application, cloud, Kubernetes, and endpoint layers, joined on agent identity and prompt context rather than timestamp proximity. This is the architectural move behind AI-aware threat detection built for non-deterministic workloads, and it is what separates detection that fires in the shape of the attack from detection that fires in the shape of whatever generic rule happened to match a symptom.
The question: If this agent goes wrong, what is our intervention mechanism, and has that mechanism been exercised against this specific agent in staging?
A response mechanism that has never been tested against this agent’s architecture is an assumption, not a control. Soft-quarantining a stateless LangChain agent gives you clean containment. Soft-quarantining a stateful multi-turn orchestrator mid-session may leave you with a partially completed cascade you don’t understand and can’t roll back. Response capability has to be validated on the specific agent, not inferred from the fact that the platform supports it in general.
The standard: the intervention ladder is documented for this agent class—kill for immediate process termination, pause for suspending the agent without terminating its session, soft quarantine for isolating it from external tools and networks while preserving state for investigation. At least one intervention has been executed against this specific agent in a staging scenario and the outcome is documented. The on-call runbook includes the intervention criteria, the decision tree for which intervention to run against which signal, and the escalation path—and the runbook is in the hands of the SRE who will be paged at 2 a.m., not in a shared drive nobody opens.
What fails this gate: no intervention ladder, or the only available response is to tell engineering to redeploy. A response plan whose first step is a deployment pipeline is a response plan that will not fire in time. The interventions may exist at the platform level, but if they have never been run against this specific agent, the response capability is theoretical—and “often works” is not the standard for an approval gate. The reason eBPF-based enforcement can run interventions in production without breaking neighboring services is that the action happens at the kernel level with per-agent scope, which is what separates interventions you can actually execute at 2 a.m. from ones you can only demonstrate in isolation.
The question: Have we defined which changes to this agent require re-opening the approval, and does engineering know the list before any change ships?
Every other checklist in this category treats approval as a one-time event. It is not. An agent approved on Monday is a different agent on Thursday if a developer connected a new MCP tool on Wednesday—but nothing in the standard approval pattern catches that. Gate 7 closes this loop by defining the trigger events before they happen, in writing, with engineering co-signing.
The re-approval trigger list is the concrete artifact of the gate. Five events that invalidate a prior sign-off:
Why this matters operationally: the most common AI agent incident pattern in the wild is not “an agent we approved was vulnerable from day one.” It is “an agent we approved changed without re-review, and then something went wrong.” A developer adds an MCP connection for a demo on Thursday afternoon, forgets to flag it, and the next attack story fires against a capability that was never in the original approval packet. Gate 7 is the mechanism that either prevents this or catches it early.
The standard: the trigger list is co-signed by engineering and security before Gate 7 is cleared, and automated drift detection against the approved behavioral baseline runs on a defined cadence to catch triggers that were shipped without being flagged. Engineering will, occasionally, forget—the safety net has to exist independently of human commitment.
What fails this gate: no trigger list at all, which is the default state of most AI agent security programs today. A trigger list that exists on paper but has no automated drift detection backing it is yellow, not green—you are relying entirely on self-reporting, which works until it doesn’t.
The deliverable of this piece is a template a CISO brings into the go-live meeting. Seven rows, one per gate. A tier column so the evidence bar is explicit on the page. A status column. A column for the person who verified the evidence. A date. A signature line. Engineering knows which evidence to bring, security knows which evidence to require, and the meeting becomes a verification exercise rather than a negotiation.
| Gate | What’s verified | Tier bar | Status | Verified by |
|---|---|---|---|---|
| 1. Inventory | Runtime-derived AI-BOM entry with owner, tier, reachable tools & APIs | Same across tiers | Pass / Conditional / Fail | |
| 2. Behavioral baseline | Coverage of declared tools, APIs, data sources under production-representative load | Window scales with tier; coverage is the floor | Pass / Conditional / Fail | |
| 3. Access reality | Declared vs. observed permission comparison with every delta explained | Stricter delta tolerance at higher tiers | Pass / Conditional / Fail | |
| 4. Blast radius | Documented worst-case scenario specific to this agent’s reachable surface | Depth of scenario scales with tier | Pass / Conditional / Fail | |
| 5. AI-specific detection | Test prompt fires a correlated incident with agent, prompt, tool, and data context | Same across tiers | Pass / Conditional / Fail | |
| 6. Response readiness | Intervention ladder exercised against this specific agent in staging | More interventions tested at higher tiers | Pass / Conditional / Fail | |
| 7. Re-approval triggers | Co-signed trigger list with automated drift detection backing it | Same across tiers | Pass / Conditional / Fail |
The artifact is a forcing function. If the checklist cannot be filled in pass for this agent today, the approval is not ready, and “not ready” is a legitimate outcome that the process is designed to produce. Conditional approvals are fine: a gate passes with a documented remediation, a named owner, and a follow-up review date. What the artifact refuses to support is the pattern most AI agent go-lives run on today, which is approval based on engineering’s confidence that nothing will go wrong rather than on evidence that the agent can be contained if it does.
A note on distribution: the checklist is here, in the piece, as-is. No email-for-PDF, no gated download. The gap between having a framework and having an operational artifact is the whole problem this piece is solving. Hoarding the artifact behind a lead form would reproduce the problem rather than solve it.
Honest limitations, because a checklist that claims to cover everything usually covers nothing well.
The gates assume runtime observability is in place. If it isn’t, Gate 2 is impossible to clear and Gates 3 through 6 collapse with it. That is a prerequisite decision, and it gets made inside the larger question of how to evaluate AI workload security tools in the first place.
The gates are per-agent. If there are 40 agents in production and no inventory, you do not run this checklist 40 times. You run it once, realize Gate 1 fails across the board, and the honest next move is to step back into the program-level sequencing of observe, posture, detect, and enforce across the whole environment before attempting per-agent approvals.
The gates do not substitute for ongoing drift monitoring, compliance mapping, or the posture analysis that happens continuously after approval. Approval is the door. Monitoring is the room. Both are required, and the checklist addresses only the door.
Autonomy tier classification is a judgment call, and two reasonable CISOs could classify the same agent differently. The bias this piece recommends is toward the higher tier: a read-only agent reviewed as Tier 2 costs you review time; a Tier 2 agent reviewed as Tier 0 costs you an incident. The asymmetry of the costs is the argument for erring upward.
The gates assume the goal is the safest possible deployment. In practice, the CISO’s real job is negotiating the frontier between capability and containment. Engineering wants broad tool access because that is what makes the agent useful. Security wants narrow tool access because that is what contains the blast radius. The gates do not resolve that tension—they surface it in a form both sides can argue about with specifics instead of abstractions. Gate 3’s 44 dormant permissions on a whiteboard is a better starting point for the negotiation than “too many permissions,” but the negotiation still has to happen, and it is usually the hardest conversation in the approval meeting.
The gates have a visibility gap at the managed LLM API boundary. If the agent calls a hosted model through an API—OpenAI, Anthropic, Bedrock, Vertex—the prompt and response pass outside your cluster. Gate 5 detection can either intercept at the egress proxy, which requires architectural work most teams haven’t done, or accept that part of the attack chain is invisible to runtime observability. For Tier 3 and Tier 4 agents, this gap should be named explicitly in the approval packet rather than quietly ignored, because it bounds what the other gates can actually certify.
The distance between “we have an AI agent security framework” and “we are confident approving this specific agent for production next Tuesday” is the artifact this piece hands you.
Several gates—behavioral baselines, declared-vs-observed reconciliation, AI-native detection with correlated attack stories, automated drift detection for re-approval triggers—are impossible to clear without runtime behavioral visibility at the Deployment level. If that capability is not in your stack yet, the gates will tell you before the incident does. Book a demo to see what the approval artifact looks like filled in against a real agent running in a real cluster, rather than as a template in an article.
How long should the Gate 2 baseline window actually be for a Tier 3 agent?
Two to three weeks is the floor, but the real answer depends on whether the window captured every tool in the declared tool set under realistic interaction volumes. If three tools in the declared set were never exercised in two weeks, either the window is too short or the tools don’t belong in the declaration. The window is a proxy for coverage, and when they diverge, coverage wins.
What if engineering pushes back on a conditional gate?
Conditional gates are not automatic blocks. They are approvals with a remediation, a named owner, and a follow-up review date. The negotiation is not “approve or reject”—it is “what’s the remediation, by when, and who signs again.” This keeps the checklist operational without turning it into an adversarial bottleneck that engineering will route around.
What should I report to the board about this process?
Quantified outputs: how many agents are approved, how many are in conditional status with active remediation, how many re-approval triggers fired this quarter, and what percentage of agents have Gate 5 detection coverage live. Those numbers are the board-level view of an AI agent security program that is running, not one that exists only in slide decks.
Security teams deploying AI agents into Kubernetes know they need behavioral baselines. The concept is...
A missing null check in libssh’s SFTP directory listing code lets a malicious server crash...
When your CNAPP flags a suspicious dependency in an AI agent container, your WAF logs...