The Library That Holds All Your AI Keys Was Just Backdoored: The LiteLLM Supply Chain Compromise
We just published a deep breakdown of the Trivy supply chain attacks yesterday. Twenty-four hours...
Mar 17, 2026
You’ve been securing Kubernetes workloads for years. Your CSPM is running, your CNAPP is configured, your team knows how to triage container alerts. Then an AI agent lands in your cluster — maybe from the data science team, maybe from a vendor integration, maybe from a tool you didn’t even know was running.
Within a week, it’s making API calls nobody planned, accessing data stores that aren’t in the architecture diagram, and executing code it generated itself.
Your existing tooling sees activity. It doesn’t see intent.
And that gap — between what your security stack was built to monitor and what AI workloads actually do — is the structural shift that “cloud-native security for AI workloads” exists to address. This piece explains what shifted, why it matters now, and what your security architecture needs to account for before the next AI agent deploys.
Cloud-native security has evolved through three distinct eras, each defined by a core assumption about how workloads behave. Each transition exposed the previous era’s blind spots — not by replacing what came before, but by revealing that its foundational assumption was insufficient for what came next.
Era 1 — Container security (2015–2019). The assumption: securing the packaging secures the workload. Image scanning, vulnerability management, registry hardening, admission controllers, CIS benchmarks — these tools treated the container image as the primary attack surface. This held as long as containers ran deterministic code. The tools from this era still work, but only for the packaging layer. They can tell you a container image has a vulnerable library. They can’t tell you what happens after the container starts running.
Era 2 — Runtime workload security (2019–2024). The assumption shifted: packaging alone isn’t enough; you also need to watch what containers do at runtime. This is where eBPF-based monitoring, behavioral baselines, and runtime anomaly detection emerged. The tools assumed that “unexpected behavior” — a new process, an unusual network connection — was a reliable signal of compromise. That assumption held as long as workloads followed the code paths a developer wrote.
Era 3 — Autonomous workload security (2024–present). AI workloads broke the Era 2 assumption. An AI agent’s normal behavior includes generating code, calling tools it wasn’t explicitly programmed to use, and making decisions that weren’t in the design spec. “Unexpected behavior” is the expected operating mode. Era 2’s detection logic — flag anything outside the baseline — produces noise instead of signal when applied to AI agents.
“You can sometimes say that application logic will never take a specific route. It will never use a specific privilege. In the case of AI agents, you might end up with something that you didn’t plan for and therefore you’ll fail. You need to lock it down much better than an application.” — Ben Hirschberg, CTO & Co-Founder, ARMO
That insight is why ARMO reoriented its entire platform architecture around the Era 3 shift — from static container security to behavioral-driven cloud application detection and response purpose-built for workloads that don’t follow predictable code paths. The shift from CNAPP to CADR as a category reflects the same realization: posture-only tools can’t protect workloads that behave autonomously.
The three-era transition isn’t abstract. Here’s how it plays out inside an actual security team.
A platform engineer at a 600-person fintech company notices an unfamiliar workload consuming GPU resources across three Kubernetes namespaces. It turns out the machine learning team deployed a LangChain-based agent two weeks ago — connected to an internal customer database, a third-party credit scoring API, and an MCP tool runtime — without filing a single change request. The agent has been running with production credentials the entire time.
The security team tries to assess the risk with their existing CNAPP. It flags what it always flags: overly permissive IAM roles, a vulnerable Python library in the container image, a missing network policy. But these findings are identical to what the tool surfaces for every other workload in the cluster. There’s no way to tell whether the agent is actually exercising those permissions, which APIs it’s calling, or whether its behavior has drifted since deployment. The CNAPP sees the container. It doesn’t see the agent.
Then an alert fires: “unexpected outbound connection.” The runtime tool detected the agent calling an external API it hadn’t called before. But the agent calls external APIs as part of its normal workflow — which ones depend on the queries it receives. Is this a prompt injection redirecting the agent to exfiltrate data through a different API endpoint? Or did a user simply ask a question the agent hadn’t encountered before? The runtime tool can’t tell the difference, because it has no concept of what a prompt is, what a tool invocation looks like, or how to distinguish a legitimate agentic workflow from an attack. This is the gap between runtime context that understands AI workloads and generic container monitoring that merely observes them.
The security team faces a choice nobody wants to make. Lock down the agent with restrictive policies based on assumptions, and risk breaking the ML team’s production workflow within 48 hours. Or leave it permissive and hope detection catches problems before they become incidents. Most teams choose the second option — not because it’s safe, but because they don’t have the behavioral data to write policies that constrain without breaking.
ARMO’s CTO experienced a version of this firsthand: his own open-source AI agent, OpenClaw, started sending unauthorized WhatsApp messages to his contacts — acting entirely outside its intended scope through a connected communication channel. Excessive agency is a critical risk for exactly this reason. When agents have unchecked autonomy, they take actions nobody predicted.
This scenario exposes the four architectural gaps that the Three Eras transition created: you can’t see what the agent is doing (behavioral visibility), you can’t assess its real risk profile (posture), you can’t distinguish attacks from legitimate behavior (detection), and you can’t enforce boundaries without breaking things (enforcement). Each of these maps to a pillar in the evaluation framework covered in the AI workload security buyer’s guide — and each requires AI-specific capability that generic cloud security tools don’t deliver.
Three forces are converging to close the window for “we’ll deal with AI security when it becomes a priority.”
AI adoption velocity is outpacing security. The fintech scenario above isn’t hypothetical — it’s the pattern playing out across Kubernetes-heavy organizations. One team connects a CrewAI agent to an internal database for a hackathon demo, forgets to tear it down, and three months later it’s still running with production credentials. These shadow AI deployments are the fastest-growing blind spot in Kubernetes security. By the time security discovers them, the agents have been operating with excessive permissions for weeks.
The threat landscape has been codified — but tools haven’t caught up. The OWASP Top 10 for Agentic Applications catalogs the threats AI agents face: agent hijacking, prompt injection, tool misuse, identity impersonation, AI-mediated data exfiltration. NIST’s CAISI request for information on AI agent security codifies the risk dimensions regulators are watching. These frameworks are built from observed attack patterns, not speculation. But ask yourself: how many of those OWASP threat categories does your current CNAPP have detection rules for? For most teams, the answer is zero.
Cloud providers are moving — but only partway. AWS GuardDuty for SageMaker, Azure Defender for AI, and Google Cloud Security Command Center for Vertex AI all now offer AI security features. But look at what they actually cover: configuration scanning, misconfiguration flagging, framework-level vulnerability detection. These are posture-layer capabilities. They can’t tell you that an AI agent accessed a customer database it had permissions to reach but had never touched before — which might be a prompt injection exploiting a legitimate permission. Teams relying solely on CSPM and cloud-native provider tooling have posture coverage with significant detection and enforcement gaps underneath.
AI workloads are already in production. The threats are already documented. And the tools most teams rely on have structural limitations that won’t be resolved by a patch or a feature update. The gap is architectural.
Cloud-native security is going through its third major inflection point. The first was containers. The second was runtime workload protection. The third is autonomous workloads — AI agents that make decisions, generate code, and interact with external systems in ways no prior security tool was designed to handle.
The architectural response requires the same progression that each prior era demanded: first see what exists, then understand the risk, then detect threats, then enforce boundaries. For AI workloads, that means runtime discovery derived from observation rather than static manifests, behavioral baselining that accounts for non-determinism, AI-specific detection mapped to the OWASP Agentic threat taxonomy, and progressive enforcement that observes before it constrains. ARMO’s architecture was designed around this exact progression. The buyer’s guide maps how each translates to specific evaluation criteria across the four pillars.
The question is whether your architecture catches up before the gap between deployment velocity and security capability produces the incident that forces the conversation. See how ARMO addresses each requirement, or book a demo to walk through the four pillars against your own environment.
Is this a real architectural shift or just a new product marketing category? It’s a real shift. The core issue isn’t that existing tools lack features — it’s that they were designed around the assumption of deterministic workload behavior. AI agents violate that assumption by design. This is comparable to the shift from perimeter security to container security: the previous generation’s tools didn’t become useless, but they became insufficient for the new workload type.
What’s the difference between AI-aware and AI-native security? AI-aware tools repurpose existing container detection rules for workloads that happen to run AI frameworks. They detect “unexpected process execution” but can’t identify a prompt injection that caused it. AI-native tools have purpose-built detection categories for prompt injection, agent escape, tool misuse, and behavioral drift — with context-rich incidents that include the agent, prompt, tool, and data involved.
How do I explain this shift to a CISO who thinks our CNAPP covers it? Ask them to check how many OWASP Agentic Top 10 threat categories their CNAPP has detection rules for. In most cases, the answer is zero. The CNAPP handles posture — misconfigurations, vulnerable frameworks, permission audits — but it can’t observe what AI agents actually do at runtime. For a CISO who needs the full evaluation framework, the buyer’s guide provides the structured walkthrough.
Do I need runtime security for AI workloads, or is posture management enough? Posture management tells you what an AI workload can access. Runtime security tells you what it actually does. For AI agents that generate code, call tools, and make autonomous decisions, the gap between “can access” and “actually does” is where the real risk lives.
We just published a deep breakdown of the Trivy supply chain attacks yesterday. Twenty-four hours...
We’ve been going back and forth on whether to publish this post. As the maintainers...
You’re forty-five minutes into a vendor demo for AI workload security. The dashboard looks polished—posture...