Blog

Home
Blog
Detecting Threats in Multi-Agent Orchestration Systems: LangChain, CrewAI, and AutoGPT

Detecting Threats in Multi-Agent Orchestration Systems: LangChain, CrewAI, and AutoGPT

Apr 23, 2026

Yossi Ben Naim
VP of Product Management

Key takeaways

Why isn't per-agent detection enough for multi-agent systems? Per-agent sensors baseline each agent's behavior in isolation and catch deviations inside a single agent's boundary. Multi-agent attacks propagate across delegation edges — the handoffs between agents — where each individual agent still behaves normally but the chain connecting them is compromised. Adding more per-agent sensors does not produce inter-agent visibility; it produces N disconnected single-agent views of one coordinated attack.
What are the three detection surfaces specific to multi-agent systems? The delegation edge (where one agent hands work to another and compromise propagates), the shared context or memory layer (vector stores, scratchpads, and session objects that one agent writes to and another reads from), and the orchestrator or supervisor node (the control plane that routes between agents and is itself a single trust boundary). None of these exist in a single-agent architecture, and all three need runtime instrumentation.

It’s Tuesday morning at a mid-size fintech. A customer-support workflow runs on CrewAI in production: a Triage agent reads tickets, a Records agent pulls customer history, a Remediation agent drafts and sends the reply. A user submits a ticket with a pasted error log containing an indirect prompt injection. Triage summarizes and delegates. Records, interpreting instructions embedded in the summary, pulls 2,400 customer records instead of one. Remediation drafts the reply and emails it to a partner address on the allowlist.

Forty minutes later, the data is gone. The CNAPP reports healthy. The egress monitor shows an allowed destination. The DLP saw a semantic summary, not structured rows. Each agent’s behavioral profile looks normal — each one did something on its authorized capability list. Triage triaged. Records read records. Remediation sent email. Three healthy agents, one breach.

The detection stack didn’t fail at any individual agent. It failed at the spaces between them — at the delegation edges where one agent handed work to the next, carrying an instruction it was never supposed to carry. This article is about those spaces. It builds on the four attack chains mapped in our AI-aware threat detection framework and extends detection to the inter-agent surfaces that LangChain, CrewAI, and AutoGPT deployments make operationally critical to secure.

Single-agent detection × N is not multi-agent detection

Most security teams running multi-agent systems in production got there incrementally. One agent first, instrumented and baselined. A second agent later, wired into the first through a framework like LangChain or CrewAI. A third added when the workflow grew. Detection followed the same path: a runtime sensor on each agent, a behavioral profile per workload, alerts scoped to the container. The instinct — a reasonable one — is that adding another agent is adding another instance of the same detection pattern. If per-agent detection works on one, and N is just more of the same, then N detections should cover N agents.

This is a category error. Multi-agent orchestration introduces detection surfaces that do not exist in a single-agent system, and per-agent sensors, regardless of how many you deploy, cannot see them.

We have previously walked what per-agent runtime detection catches for single-agent workloads, where each agent’s behavior is correlated across the kernel, container, Kubernetes, and application layers to reconstruct one attack story. That correlation is necessary, and it is insufficient for multi-agent systems. The attack in the opening scenario was not a single-agent attack repeated three times. It was one coordinated attack that moved between three agents, and no per-agent view ever saw the connecting edges.

Three surfaces are specific to multi-agent systems and absent from single-agent ones. The delegation edge is where one agent hands work to another — a LangGraph node transition, a CrewAI task delegation, an AutoGen message-passing turn, an AutoGPT subgoal spawn. The edge is where context crosses from one agent’s authority boundary into the next, and it is where a compromised instruction propagates from a low-privilege agent into a high-privilege one. The shared context and memory layer is where agents coordinate without explicit delegation — a scratchpad, a vector store, a conversation buffer. One agent’s write becomes another agent’s read, which means a compromised write becomes an input on the next agent’s turn without ever passing through an explicit handoff. The orchestrator or supervisor node is the control plane that routes between agents: the LangGraph compiler, the CrewAI Crew object, the AutoGPT planner, the AutoGen GroupChatManager. It sees every delegation in structured form, and it is itself a single trust boundary whose compromise cascades downstream.

The rest of this article walks each surface, shows what runtime detection needs to capture, and maps the framework-specific mechanisms — LangChain, CrewAI, AutoGPT, AutoGen, and the Model Context Protocol — onto the detection primitives that fire at each surface.

The contagion chain: walking a multi-agent attack end to end

Naming the surfaces is abstract. Watching them fail in sequence is not. This is the fintech scenario from the opening, walked stage by stage, with explicit columns at each stage for what per-agent tools see and what inter-agent detection would see.

Stage 1 — Injection at the user-facing boundary

The customer submits a ticket containing an indirect prompt injection in a pasted error log. The Triage agent’s job is to summarize the ticket and classify it. The injection is a few lines of text inside a multi-paragraph log, structured to survive summarization and emerge as a directive in the Triage agent’s internal reasoning.

Per-agent detection at this stage sees: an HTTP POST to the ticketing intake endpoint, a normal LLM call, a normal tool invocation for classification, a normal completion. No baseline deviation — the Triage agent does this every minute. Inter-agent detection sees the same, because the contagion has not crossed an edge yet. Stage 1 is indistinguishable from normal work. This is structural; the infection starts invisible.

Stage 2 — Delegation from Triage to Records with expanded scope

The Triage agent’s summary now carries the injected instruction, and when it delegates to Records, the task payload contains not just “look up this customer’s recent tickets” but “look up this customer’s recent tickets and related customers in the same geographic segment for context.” The CrewAI task payload serializes this as structured delegation.

Per-agent detection sees a Python method call from one pod to another through the CrewAI framework, or a gRPC call if the crew is distributed. Both are intra-cluster, on the service mesh’s allowed path, and below any network alerting threshold. Inter-agent detection at the delegation edge would see the Triage to Records delegation carrying a scope field that is outside the observed scope envelope for this edge. Every previous delegation on this edge carried a scope of “this customer’s recent tickets.” The current delegation carries a scope of “this customer and a segment.” The deviation lives in the edge, not in either endpoint.

Stage 3 — Records agent executes on expanded scope

The Records agent receives the task and executes against its database read permission. It is authorized to read customer records. It reads 2,400 customer records, which — taken in isolation — is within its permission envelope but outside its observed access pattern.

Per-agent detection at Records sees a database read, sized larger than typical. Depending on baseline granularity, this might or might not cross a threshold. Most per-agent baselines are tuned to avoid false positives on legitimate bulk operations, so a single large read below an operator-set ceiling does not fire. Inter-agent detection sees the bulk read arrive from a delegation edge that already deviated. A bulk read on its own is ambiguous. A bulk read triggered by a delegation with an out-of-baseline scope field is high-confidence. The combination of two signals — one at the edge, one at the endpoint — is what resolves the ambiguity.

Stage 4 — Records delegates to Remediation with outbound destination embedded

The Records agent now delegates to Remediation, and the delegation payload includes not just the data to summarize but also an email destination to send the summary to. This destination was suggested by the injected instructions as “the authorized partner contact on record.” The address is on the allowlist; the partner is real, and the address has been used legitimately by this crew before.

Per-agent detection sees another intra-cluster delegation, normal shape, normal payload size. Inter-agent detection sees the Records to Remediation delegation carrying an outbound destination field that was not previously in the edge’s observed payload schema. Most Records to Remediation delegations do not carry a destination at all — Remediation resolves the destination from ticket metadata. When the delegation payload itself carries the destination, the edge schema has changed. That is the signal.

Stage 5 — Remediation sends email

Remediation drafts the reply, the reply contains a semantic summary of 2,400 customer records, and the email goes to the allowlisted partner address.

Per-agent detection at Remediation sees a normal email tool invocation to an allowlisted destination, with a payload below the DLP size threshold, containing a semantic summary that does not match any DLP signature. Nothing fires. The CNAPP confirms policy compliance. The egress monitor approves the destination. This is where the data leaves, and every destination-layer tool is structurally blind — the same pattern we have previously mapped in AI agent tool misuse and API abuse. Inter-agent detection sees the sending action as the completion of a chain whose two earlier edges already deviated from baseline. The alert that fires is not “suspicious email” — which would never fire in isolation — but “completed contagion chain across three delegations, each with a distinct baseline deviation.”

This is the structural difference. No per-agent sensor in the stack produced an actionable alert at any individual stage. Each deviation, taken alone, was below threshold or ambiguous. The chain is what resolves the ambiguity, and the chain lives at the edges, not at the nodes.

The Framework Delegation Matrix

Every framework calls it something different. LangChain has chains and LangGraph has state graphs. CrewAI has Tasks and Crews and Processes. AutoGPT has goals and subgoals. AutoGen has conversations and group chats. The Model Context Protocol has tool invocations that cross process or network boundaries. Each mechanism looks different at the kernel and network layers — which is exactly why a detection stack built for one generalizes poorly to another.

The table below maps each framework’s delegation mechanism to the signal you need, the layer the signal lives at, and the tool class that is structurally blind to it.

Framework	Delegation mechanism	Detection signal	Structurally blind tool class
LangChain / LangGraph	Typed edges in a state graph; shared AgentState object mutated on each node transition	Edge traversal sequence plus state mutation pattern per edge	eBPF-only sensors (no framework parser); CNAPP (no concept of edges)
CrewAI	Role-to-Task assignment via sequential or hierarchical Process; manager-agent routing	Task-to-agent mapping plus delegation depth per chain	SIEM (sees HTTP calls, not role transitions); WAF (sees only the initial request)
AutoGPT	Self-prompting loop; goal-decomposition tree; recursive subgoal spawn	Iteration count anomaly plus subgoal semantic drift	WAF (does not see the internal loop); egress monitor (normal outbound shape)
AutoGen	Conversable agent message passing; GroupChatManager speaker selection	Speaker-selection anomaly plus message-round deviation	All kernel-layer tools (events are in-process objects)
Model Context Protocol	Protocol-level tool invocation delegation between agents and tool servers	Tool-call provenance chain; originator-to-destination trace	DLP (intra-cluster invisible); network egress (allowed service endpoints)

The pattern across the matrix matters more than any single row. Every delegation mechanism lives at the application layer, in structured form, inside the framework’s own data model. Kernel and network tools see the byproducts — a subprocess, a socket, a memory allocation — without the semantic context of what the delegation meant. WAFs and egress monitors see the outside-to-inside traffic that triggered the chain and then go blind because the chain now runs intra-cluster or intra-process. CNAPPs and SIEMs see individual events across the stack, but without a framework parser they cannot reconstruct the delegation edge. The detection surface lives where the framework lives — at the application layer — and that is the layer most runtime security tools were built to complement rather than inhabit.

LangChain and LangGraph. LangChain’s classic Chain abstraction strings tool calls together inside a single agent’s reasoning loop; LangGraph generalizes this to a directed graph where each node is an agent or tool, edges are state transitions, and the shared AgentState object carries context forward. The delegation signal is the edge traversal plus state mutation pair: which node routed to which, and what fields of AgentState were mutated on that edge. A baseline built at the edge level captures typical traversal sequences and typical state mutations; deviations — a new edge fired for the first time, or an unexpected field mutated — are the detection primitive.

CrewAI. CrewAI models work as a Crew of Agents assigned Tasks, executed through a Process that is either sequential or hierarchical. Delegation happens when one Task’s output becomes another Task’s input, either through the sequential pipeline or through a manager agent’s routing decision. The signal is task-to-agent mapping plus delegation depth: which task was assigned to which agent, how deeply the delegation chain ran, and whether the depth or mapping deviated from baseline. Hierarchical CrewAI processes are particularly important to baseline because the manager agent can recursively delegate, and a baseline that only captures first-level delegation will miss compromise in deeper turns.

AutoGPT. AutoGPT’s model is self-prompting: a planner agent decomposes a goal into subgoals, spawns worker instances — often in the same process, sometimes in child processes — to execute them, and loops until the goal completes or iteration limits hit. The signal is iteration count plus subgoal drift: how many recursive expansions happened, and whether the subgoal semantic space drifted from the original goal. A planner that normally completes in four to seven iterations and suddenly runs forty is a signal. A planner whose subgoals diverge semantically from the original goal — measurable at the prompt level — is a stronger one.

AutoGen. AutoGen’s conversable agents pass messages to each other through a GroupChatManager or through direct peer messaging. Delegation is embedded in the message-round and speaker-selection logic. The signal is speaker-selection anomaly plus message-round deviation: which agent spoke next after which, and whether the round count for a given task deviated from baseline. AutoGen is harder to baseline than pipeline-based frameworks because speaker selection is model-driven and inherently variable; the baseline has to capture variance bands, not exact sequences.

Model Context Protocol. MCP is a transport protocol rather than an orchestration framework, but it is increasingly the substrate for tool-mediated delegation between agents and tool servers. The signal is tool-call provenance chain: which agent invoked which tool server, whether the tool server’s subsequent calls trace back to the originating agent’s request, and whether the provenance matches the declared capability graph. A tool server invoking another tool server on behalf of an agent, without the agent’s direct request in the provenance chain, is a delegation pattern that should be baselined specifically.

The common thread: every framework’s delegation signal lives at the application layer, captured by parsing the framework’s own data model. A detection stack that does not reach into the framework layer sees only the execution byproducts — and those byproducts are exactly what the three new detection surfaces were designed to render invisible.

The three new detection surfaces

The delegation edge

The delegation edge is where contagion propagates, and where per-agent detection goes blind by construction. Per-agent behavioral baselines — the pattern we have previously mapped for per-agent intent drift detection — establish what a single agent does in isolation. They answer the question: is Agent B behaving normally? They do not answer: is the delegation path Agent A to Agent B behaving normally? The answer to the second question requires a different baseline unit: the edge itself.

An edge baseline captures the typical payload shape for delegations from A to B, the typical scope fields the payload carries, the typical frequency of the edge firing, and the typical downstream effect at B. A deviation at the edge can be structural — a field in the payload that has never appeared before — scope-based — a scope value outside the observed range — or frequency-based — the edge firing at a rate inconsistent with the workflow’s normal rhythm. Edge baselines are the architectural extension of per-agent baselines, not a replacement, because a multi-agent system needs both: single-agent baselines to catch compromise inside one agent’s boundary, and edge baselines to catch compromise that propagates between agents.

The signal does not require a new telemetry source. Every framework already produces the edge data — CrewAI logs task delegations, LangGraph logs node transitions, AutoGen logs speaker selections. What it requires is wiring that telemetry into the detection layer rather than into developer observability alone.

Shared context and memory

Multi-agent systems rarely limit coordination to explicit delegations. A vector store holds retrieved context that multiple agents read. A scratchpad tracks intermediate plans that the next agent consumes. A shared session object carries conversation state across turns. Each of these is a multi-tenant attack surface inside your own application, and each one violates a detection assumption that single-agent baselines quietly rely on.

The assumption is that an agent’s inputs come from a source outside the agent ecosystem — a user, an API, a database — and can be evaluated at the ingestion boundary. In multi-agent systems, the next most common input source is another agent’s previous write. A compromised write by Agent A becomes the input for Agent B’s reasoning without any explicit delegation event to alert on. This is the closest thing the multi-agent world has to a server-side request forgery: one agent writing a payload into a shared resource that another agent then reads and acts on.

The detection primitive is a cross-agent write-then-read baseline. For each shared context store, which agents write to it, which read from it, what write patterns correlate with what read patterns, and what schema mutations are normal. The signal fires when Agent A writes content with a shape the store has not previously held, and Agent B reads that content on the next turn and acts on it in a way that deviates from its own baseline. Neither event in isolation is enough. The pattern across both is.

The orchestrator as the security audit plane

Every multi-agent framework has a control node that sees every delegation in structured form. LangGraph’s compiled graph holds the full state-transition log. CrewAI’s Crew object executes the Process and knows which Task was assigned to which Agent in which order. AutoGPT’s planner holds the decomposition tree. AutoGen’s GroupChatManager selects speakers and tracks message rounds. The orchestrator is, by design, the single place with a complete view of inter-agent activity.

And by current convention, the orchestrator’s logs go to the ML team’s observability stack. LangSmith, Arize, Phoenix, OpenTelemetry collectors feeding developer dashboards. These tools answer debugging questions — why did the chain produce this output, how many tokens did it use, which node was slow. They do not answer security questions. We have previously argued that developer observability is not the same thing as security observability, and the multi-agent case is where that distinction becomes operationally critical. The orchestrator is producing audit-grade structured telemetry about delegation events, and the SOC has no pipe into it.

The prescription is straightforward and underdeployed: treat the orchestrator’s delegation stream as runtime security telemetry. Pipe it into the detection layer alongside the kernel-, container-, and Kubernetes-layer signals that already feed correlation. When a contagion chain fires, the orchestrator’s record of what delegation happened when is what turns three scattered per-agent signals into one attack story. The telemetry already exists. Most security teams are not yet subscribed.

What changes in your stack when detection moves to the inter-agent layer

Multi-agent detection pulls three things into the runtime stack that single-agent detection could live without.

The first is application-layer behavioral profiles that extend beyond the per-agent boundary. ARMO’s Application Profile DNA, the Deployment-level behavioral baseline described in the per-agent intent drift work, establishes what a single agent’s process, network, and tool-invocation behavior looks like over time. In a multi-agent system, this baseline is necessary and not sufficient — the delegation edge is a distinct unit that needs its own behavioral envelope. The architectural direction is extending application-layer profiling to capture typical delegation patterns between agent pairs as part of the overall runtime profile, so that a chain crossing multiple edges can be evaluated against the combined baseline, not just the endpoints.

The second is cross-layer correlation that joins framework-layer events to execution-layer events. Kernel-layer eBPF telemetry captures the byproducts of a delegation — a subprocess spawned, a socket opened, a file written — but without the framework context these look like ordinary container activity. Framework-layer events — LangGraph transitions, CrewAI task assignments, AutoGen speaker selections — provide the semantic context that tells the correlation engine which kernel event corresponded to which delegation. ARMO’s CADR approach, which unifies kernel, container, Kubernetes, and application-layer signals into one attack story, is the correlation pattern multi-agent detection requires. The work is not collecting more data. It is joining the framework layer into a correlation that has historically stopped at the application layer’s outer boundary.

The third is a runtime inventory that includes the delegation graph, not just the agent list. An AI-BOM that captures what agents exist, what frameworks they run, what tools they invoke, and what data they access is the starting point — and for multi-agent systems, the inventory has to include the delegation edges as first-class entities: which agents can route to which, through which framework mechanism, carrying what payload shape. The delegation graph is the multi-agent analog to the service dependency graph a microservice observability stack would produce. Without it, the inventory treats the multi-agent system as N independent agents rather than one coordinated workload, which is the same category error that broke detection in the opening scenario.

ARMO’s platform auto-discovers LangChain, CrewAI, AutoGPT, and AutoGen deployments in Kubernetes clusters without manual tagging, profiles the per-agent behavior through Application Profile DNA, and correlates cross-layer signals through CADR. Security teams running multi-agent systems in production can see how these capabilities assemble into inter-agent detection on their own clusters with ARMO’s cloud-native security for AI workloads.

The multi-agent detection requirements checklist

Before committing a multi-agent system to production, a security team running LangChain, CrewAI, AutoGPT, or AutoGen deployments needs affirmative answers to a short list of questions. These are the minimum requirements for inter-agent detection — not the best-case capabilities, but the baseline without which contagion chains are undetectable.

The first question is whether delegation edges are instrumented. Every framework produces delegation events in its own logs; the requirement is that those events reach the detection layer, not just the developer observability stack. If the answer is “LangSmith has them, the SOC does not,” the coverage is incomplete.

The second is whether edge-level baselines exist. Per-agent baselines alone do not catch contagion; the baseline unit has to include the delegation pattern between agent pairs. If the runtime tool baselines agents but not edges, the detection model is single-agent tooling extended linearly.

The third is whether shared-context stores are covered. If the vector store, scratchpad, or session object is not monitored for cross-agent write-then-read patterns, the covert coordination channel is invisible.

The fourth is whether framework-layer telemetry is correlated with kernel-, container-, and Kubernetes-layer signals. Application-layer signals without execution-layer correlation leave you unable to reconstruct the chain; execution-layer signals without application-layer context give you ambiguous kernel events you cannot explain. This is the same runtime-versus-declarative distinction that separates behavioral detection from configuration-only posture, applied to the multi-agent case.

The fifth is whether the runtime inventory includes the delegation graph. An AI-BOM that lists agents without edges describes the workload incompletely, which means the first question a responder will ask during an incident — which agents could route to the compromised one — cannot be answered from the inventory.

A no on any of these does not mean the stack is broken. It means the stack is tuned for the single-agent case, and the multi-agent case needs an explicit extension.

Returning to the chain that started out invisible

The 2,400-record incident from the opening was not an exfiltration-detection problem. The DLP, CNAPP, and egress monitors would still be blind in a world where every one of those tools worked perfectly, because every one of them was looking at the destination layer, and the destination was fine. It was a delegation-edge detection problem. The signal lived in the inter-agent traffic the SOC was not subscribed to, at a surface the detection stack was not instrumented on, against a baseline the runtime profile did not maintain.

Multi-agent systems are not going to stop being deployed. Teams have too many workflows that benefit from decomposition across specialized agents to stop at one. The question is whether the detection architecture grows at the same rate as the agent architecture. Per-agent sensors will remain part of the stack, and prompt-injection, tool-misuse, and escape detection — the categories we have mapped in prompt injection detection, AI agent tool misuse, and the broader AI-aware threat detection framework — remain the first lines of defense inside each agent’s boundary. The multi-agent case adds a second line: the edges between agents, the shared stores around them, and the orchestrator routing through them. A detection stack that covers only the first line will miss the class of attacks that only exist on the second.

Security teams running LangChain, CrewAI, AutoGPT, or AutoGen deployments in production can walk a multi-agent contagion chain on their own clusters with ARMO — book a demo to see how edge baselines, shared-context monitoring, and orchestrator telemetry assemble into one inter-agent attack story.

FAQ

What is a delegation edge and why does it need its own baseline?

A delegation edge is the point at which one agent hands work to another — a LangGraph state transition, a CrewAI Task assignment, an AutoGen message turn. Each edge has its own typical payload shape, scope envelope, and frequency pattern, and these can deviate independently of either agent’s endpoint behavior. An edge baseline catches the category of attack where both endpoints look healthy but the relationship between them has been compromised.

Can developer observability tools like LangSmith or Arize be used for security detection?

They provide part of the telemetry but cannot replace security detection. Tools like LangSmith and Arize were built to answer debugging questions about reasoning quality and cost, not about behavioral deviation from a security baseline. The delegation events they capture are the right input; security detection needs that input correlated against behavioral baselines and joined to kernel-layer runtime signals — a different output from the same source.

How do you baseline delegation patterns when speaker selection is model-driven and variable?

For frameworks like AutoGen where the next speaker is chosen by an LLM rather than a fixed pipeline, exact-sequence baselining produces too many false positives. The baseline has to capture variance bands — typical message-round counts, typical speaker-transition probabilities, typical group composition — rather than deterministic sequences. Deviation is then measured as distance from the variance envelope, not mismatch against an expected path.

How does multi-agent detection relate to MCP security?

The Model Context Protocol is one of the transports over which multi-agent delegation happens, particularly when tool servers mediate between agents. MCP adds provenance requirements — the detection layer needs to trace which agent originated a tool call that was relayed through a tool server — on top of the baseline delegation-edge and shared-context requirements. MCP is an important substrate for multi-agent detection but not the whole surface.

Can multi-agent detection be retrofitted onto a stack built for single-agent workloads?

Yes, and this is the more common starting point than greenfield deployment. The retrofit path is to subscribe the detection layer to the orchestrator’s existing delegation telemetry, extend behavioral profiling from per-agent to per-edge, instrument shared context stores, and fold the delegation graph into the runtime inventory. Per-agent baselines and existing kernel-, container-, and Kubernetes-layer correlation remain in place — the work is additive rather than replacement.

Apr 23, 2026

How Healthcare Platform Teams Should Secure AI Agents on Kubernetes

The surgeon is thirty-two minutes into a procedure. The ambient scribe pod listening to the...

Shauli Rozen

CEO & Co-founder

Apr 23, 2026

AI Agent Security Framework on GKE: Implementation Guide

Your platform team spent a week configuring the Agent Sandbox CRD on a gVisor-enabled node...

Ben Hirschberg

CTO & Co-founder

Apr 22, 2026

AI Workload Security for Healthcare: What CISOs Need to Prove Under HIPAA

A patient calls your privacy office and requests an accounting of every disclosure of her...

Yossi Ben Naim

VP of Product Management