Get the latest, first
arrowBlog
Tool Call Analysis for AI Attack Detection: Reading What Rides Inside the Call

Tool Call Analysis for AI Attack Detection: Reading What Rides Inside the Call

May 30, 2026

Yossi Ben Naim
VP of Product Management

Key takeaways

  • Why isn’t logging tool calls the same as analyzing them? Logging captures the invocation: which tool fired, when, and how often. Analysis reads what rides inside the call — the parameter values, the payload, the operation being performed — and that is the layer where the attack actually lives. A complete tool-call log can sit next to a successful breach, because the malicious part was a field inside a call the log already recorded.
  • What does the argument layer of a tool call reveal that the invocation layer can’t? Four things: whether a parameter value falls outside the range this agent has ever passed, whether the payload exceeds the agent’s normal size and shape, whether the operation is a write where the agent only ever reads, and whether an argument’s value originated from untrusted retrieved content. None of these are visible when the call is treated as a single event with a name and a timestamp.
  • Why is the tool call the highest-value thing to analyze in an agent? A tool call is the agent’s only action primitive — it cannot read a database, write a file, reach a network endpoint, or hand work to another agent except by issuing one. Every attack outcome — however the agent was subverted — has to cross this surface to become real. That makes the contents of the call the one place every attack is observable.

A compromised agent doesn’t make a single call it isn’t allowed to make. It queries a table it’s authorized to read, calls a tool it’s authorized to use, sends to a domain that’s on the allowlist. Every call is legal. The attack is in the values it passes, and your tool-call log records all of it as a clean day’s work.

A tool call has two layers. Almost every tool you run reads the first one: the call itself: which tool, in what order, at what rate. Almost nothing reads the second: what is inside the call.

Teams treat a complete tool-call log as detection coverage. It is not. Logging the call is not the same as reading the call, and the gap between those two things is the entire attack surface this article is about. Tool-call analysis is one surface of the broader framework for detecting AI agent attacks; this piece is the operational deep-dive on it – the two layers of a tool call, the four things the argument layer reads that the call name never tells you, and where reading the contents catches attacks the call-as-event view cannot see.

A Tool Call Has Two Layers, and Your Stack Only Reads One

Every tool call an agent issues carries two analytically distinct layers, and they fail differently.

The first is the invocation layer: which tool the agent called, in what sequence relative to other calls, and at what frequency. This is the call treated as an event. It is well-instrumented and well-covered — framework SDKs emit a structured event every time an agent invokes a tool, audit logs record the call, and most detection tooling reads exactly this. When a vendor says it “analyzes tool calls,” this is almost always the layer it means.

The second is the argument layer: the parameter values passed into the call, the size and shape of the payload, the operation the call performs, and the provenance of each field. This is the contents of the call rather than the fact of it. Very little reads this layer in the context of what the specific agent normally does, which is the reason it remains the productive place for an attacker to work.

The distinction matters because of what a tool call is. An agent has exactly one way to affect anything outside its own reasoning: issue a tool call. It cannot touch a record, modify a file, reach an external service, or delegate to another agent except by issuing a tool call. The reasoning happens in the model; the consequences happen in tool calls. That makes the tool call the chokepoint — not one detection surface among several, but the surface that every attack outcome has to pass through to become real, whatever the agent’s prompts and context looked like beforehand.

The consequence sets up the rest of this article. If the attack rides in the arguments and you only read the invocation, the call looks authorized at precisely the layer you are watching. The tool is on the allowlist. The sequence is plausible. The rate is normal. Everything the invocation layer can see checks out, while the thing that makes the call an attack sits one layer down, unread.

Four Things the Arguments Tell You That the Call Name Doesn’t

Reading the argument layer means reading four specific properties of a call that the invocation layer cannot represent. Each one is a different question asked of the same tool call, and each is invisible the moment the call is treated as atomic.

The value sits outside the range this agent has ever passed

The tool is authorized and the target is authorized. The value in the parameter is one the agent’s observed behavior has never produced.

Consider a reporting agent whose query tool has, across weeks of operation, only ever requested record counts in the tens — it pulls a day’s tickets, summarizes them, moves on. The same tool now carries LIMIT 500000. The tool is unchanged. The table is one the agent legitimately reads. Only the value is wrong, and it is wrong in a way that says someone is trying to pull the whole table in one call.

The invocation layer sees an authorized query call to a permitted table and approves it. The argument layer reads a parameter value that sits far outside the distribution this agent has ever generated, and flags the call for exactly that reason.

The payload exceeds the agent’s normal size and shape envelope

An email-sending agent invokes its send tool dozens of times a day, every time with a short summary bound for an internal address. The call is one it makes constantly; the amount and structure of data inside it is not. This time the call carries a body two orders of magnitude larger than anything it has ever sent — not a summary but a serialized result set, thousands of rows of structured data, headed to an allowlisted domain.

The invocation layer sees a routine send to an approved destination, indistinguishable from the agent’s daily pattern. The argument layer reads a payload whose size and shape fall outside the envelope this agent has ever produced — the signature of data leaving inside an authorized channel rather than a message being sent.

The operation is a write where the agent only reads

Same tool, same target, same frequency — different operation. This property blurs into the other two most easily, so it is worth isolating: it is not a question of which resource the call touches, and not a question of how many calls the agent makes, but of the verb itself.

A database tool scoped to a table the agent uses legitimately has, in its entire observed history, issued nothing but SELECT. The call now in front of you is an UPDATE against that same table. Nothing about the tool, the target, or the call rate has changed. The agent is simply performing an operation it has never performed — a write where it has only ever read, the same permission-action break that detection of agent escape keys on — and the operation is the kind that changes state rather than reads it.

The invocation layer sees an authorized call to an authorized table and lets it through. The argument layer reads a mutation issued by an agent whose entire observed operation set is read-only, and treats that mismatch as the signal.

The argument’s value came from untrusted retrieved content

An agent processing incoming support tickets reads one whose text, taken as instruction rather than data, points it at a file path — the same indirect prompt injection path where most agent attacks begin — and the parameter that carries that value is well-formed and inside the agent’s normal range. Its origin is the problem. Moments after reading the ticket, the agent issues a read_file call whose path parameter holds that value. The path is syntactically valid. It might even resemble paths the agent has touched before. But the value did not come from the user’s task — it came from content the agent ingested, which means an upstream document is now steering the agent’s actions.

The invocation layer sees a valid path argument on a permitted tool. The argument layer reads a value whose provenance traces to untrusted retrieved context rather than to the agent’s actual instructions, and surfaces the call on those grounds.

These four reads share one property: each disappears the instant the call is treated as a single event. They are not four separate detectors bolted together. They are four ways of reading inside one tool call — value, payload, operation, and provenance — and the reason argument analysis exists as a distinct discipline is that the invocation layer can represent none of them.

Authorized Call, Unauthorized Contents: Where Reading the Arguments Earns Its Keep

The four reads are not attack chains, and the point of naming them is not to walk an incident from foothold to exfiltration. The point is narrower and more useful: there is a category of attack that consists of a single call the invocation layer is built to approve, where the only evidence of compromise is in the contents. Tool misuse is the familiar case of an authorized action turned malicious through its sequence and timing. This is the dual of that — authorized call, unauthorized contents.

Three shapes recur, and in each one the invocation layer approves the call on sight:

The single call the invocation layer approvesWhat the argument layer reads inside it
An authorized tool whose ordinary call now carries an extra populated field — a benign-looking parameter now holding data the field was never meant to carryA value, and a provenance, that match nothing this agent legitimately produces — the field is being used to smuggle, not to function
An allowlisted send or post to a destination already on the approved listA payload whose size and shape are an exfiltration envelope rather than the agent’s normal output — the destination passes, the contents do not
A pre-approved command-style tool the agent is permitted to callArguments shaped to extend the operation beyond anything that approving the tool was ever meant to authorize — the permission was for the tool, not for what the arguments now make it do

The common thread is that every defense the invocation layer offers passes. The tool name is on the allowlist. The destination is approved. The permission check succeeds, because the permission was granted to the tool and the tool is exactly what was called. The only signal that any of these is an attack lives in the value, the shape, the operation, or the origin of the arguments — which is the one layer an invocation-only view does not read.

Each row is a single call, frozen in place. There is no sequence here, no before and after, no chain of stages to reconstruct. The chained version of these attacks — how a foothold becomes lateral movement becomes exfiltration — is its own subject, and ARMO has broken down the scope, sequence, and rate patterns of rogue agent tool misuse in depth. What argument analysis contributes is the read on the individual call, before and beneath any chain it might belong to.

Reading Arguments Only Works Against a Per-Agent Behavioral Envelope

Every one of the four reads depends on a comparison, and the comparison is the hard part. LIMIT 500000 is an ordinary call for a bulk-export agent built to move large datasets, and an alarm for a reporting agent that summarizes a few dozen tickets at a time. An UPDATE is routine for a write-path agent and anomalous for a read-only one. A large payload is the entire purpose of one agent and the exfiltration signature of another. None of the four properties means anything in the abstract. Each means something only against the observed behavior of the specific agent that issued the call.

That comparison cannot be a static rule, and this is where most approaches break. A fixed threshold on payload size either fires on every agent whose legitimate output happens to be large, or is loosened until it stops catching anything. A hardcoded list of allowed parameter values cannot survive contact with a non-deterministic agent whose legitimate value range is genuinely wide and genuinely variable. The envelope an argument is read against has to be the observed range of values, payloads, operations, and provenances for that one agent — and it has to tolerate non-determinism by being a boundary around the agent’s normal operating range rather than an enumeration of permitted inputs. An attacker who stays entirely inside that envelope gives up the leverage that made the tool call worth hijacking: the out-of-range value, the oversized payload, the unfamiliar operation are the things the attack needs in order to accomplish anything.

Building that envelope is what defining normal behavior for an AI agent from runtime data is for, and it is the layer ARMO’s Application Profile DNA produces: a per-Deployment behavioral profile that captures the agent’s tool-call patterns, value ranges, payload envelopes, and operation set as a single envelope per agent rather than per pod. When an argument arrives outside that envelope, the platform’s read is stated as an outcome — this value sits outside the range this agent has produced, this payload exceeds its observed shape — not as a generic threshold breach. During the period the profile is still converging, the system runs in visibility-only mode, so the false positives that come with any learning window do not turn into production blocks.

Reading the argument is only half of the work. The other half is tying it to what the call then did. An out-of-range parameter on its own is a flag; the same parameter tied to the kernel-level syscall that followed it — the process that spawned, the file that opened, the connection that left the cluster — is a finding. That join sits on the same runtime telemetry foundation the rest of detection depends on: ARMO’s application-layer correlation links the argument anomaly at the tool-call layer to the runtime events around it, so a call that looks benign in isolation is read together with its consequences. Producing a single attack story across the cloud stack from application, container, Kubernetes, and cloud signals is the work CADR does, and it is the difference between an alert about a single odd value and an account of what that value actually did. The broader posture this sits inside is cloud-native security purpose-built for AI workloads, where the tool call is treated as the primitive it is rather than as one more log line.

The Call Was Always There. Reading It Is the Work.

The tool call is the chokepoint – the one primitive every agent action, and every attack, has to cross to touch anything real. Treating it as an event with a name and a timestamp reads only that it happened. Reading its contents — the value against the agent’s range, the payload against its envelope, the operation against its verb history, the provenance against the agent’s actual task — is what turns a record of the call into a detection of the attack inside it.

You were already capturing the call. The work that remains is reading what was inside it and the fastest way to see whether your stack reads the argument layer or only the invocation layer is to run it against your own agents: pull a tool call your baseline flags, and check whether your current tooling can tell you the value was wrong, or only that the call was made.

Frequently Asked Questions

How do I start analyzing tool-call arguments if I’m only logging tool names today?

Begin by capturing the structured arguments your framework already emits — most agent SDKs include the parameters, not just the tool name, in their callback events. The first useful step is recording value ranges, payload sizes, and operations per agent so you have something to compare against later. You do not need a model to start; you need the argument data retained and attributed to a specific agent identity, which is the input every later read depends on.

How do I tell a malicious argument from a legitimately unusual one?

You compare it to the observed behavior of that specific agent, not to a global rule. A large payload from an agent built for bulk export is normal; the same payload from a summarization agent is not. The comparison only works when the baseline is per-agent and built from runtime observation, because the same argument value can be routine for one workload and an attack signature for another — the agent’s own history is what makes the call readable.

How do I analyze arguments without storing raw prompts or sensitive parameter values?

Store the security-relevant metadata rather than the raw content: value distributions and ranges, payload sizes, operation types, parameter shapes, and provenance flags, rather than the literal strings passed. This lets you detect an out-of-range value or an oversized payload without retaining the sensitive data itself, which keeps the analysis compatible with environments that cannot log prompt or parameter content for privacy reasons. The signal lives in the shape and origin of the argument, which metadata preserves.

How does argument-level analysis sit alongside the WAF and SIEM I already run?

It fills the layer neither was built to read. A WAF inspects traffic at the perimeter and cannot interpret whether an internal tool call’s arguments match an agent’s intended behavior; a SIEM aggregates the logs but lacks the per-agent context to know that a given value is out of range for this workload. Argument analysis runs at the application layer where the call’s contents are visible in behavioral context, and its findings can feed the SIEM as enriched signal rather than competing with it.

How does this work for managed agent runtimes that don’t expose tool-call internals?

Coverage depends on what the platform exposes. Fully managed agent runtimes often do not surface framework-level tool-call telemetry, which is why the argument layer can be partly opaque inside them and why many teams keep managed agents on lower-blast-radius tasks until that visibility improves. Where you control the runtime — agents you deploy on your own Kubernetes — the framework SDK events and the kernel-level signal beneath them are both reachable, and the argument layer is fully readable.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest