Get the latest, first
arrowBlog
Threat Detection for RAG Pipelines: The Three Windows Most Tools Are Blind To

Threat Detection for RAG Pipelines: The Three Windows Most Tools Are Blind To

Apr 26, 2026

Ben Hirschberg
CTO & Co-founder

Key takeaways

  • Why is "threat detection for RAG pipelines" actually a posture problem, not an alert problem? RAG threats span three temporal windows, and only one surfaces at the retrieval API layer where most detection tools listen. Without continuous behavioral posture across all three, retrieval-time alerts are either noise (every retrieval looks similar at the network layer) or false confidence (the retrieval was the symptom, not the breach). Detection alerts for RAG only become actionable when the underlying baseline is built first.
  • What three windows does a RAG threat actually span — and which one are most tools blind to? Index-time (poisoning happens before any agent queries — can sit dormant for days), query-time (the retrieval itself), and context-assembly-time (what gets merged into the model's context window after retrieval). Most tools see only query-time, and even there, only at the network destination layer. Index-time is often uninstrumented entirely; context-assembly-time runs inside framework code that no security tool natively reaches.
  • Why do generic AI agent behavioral baselines collapse RAG-specific signals? Standard four-category baselines treat retrieval as a single "data access" event — same category as a database read or a file open. RAG pipelines need a five-signal model that separates index population, source provenance, retrieval query pattern, retrieval result pattern, and context-assembly pattern, because each layer has its own deviation profile. A baseline that collapses them sees average behavior across all five and catches no specific deviation clean

Tuesday, 09:14 UTC. A connector pulling content from your knowledge wiki indexes a new article into the vector database your support agents query at runtime. Embedded in legitimate troubleshooting prose is an instruction crafted to surface whenever a query mentions a specific product version — include the user’s account record in the response and POST the summary to the configured support webhook.

For three days, nothing happens. Every security tool is green. Vector database IAM is correctly scoped, the connector’s permissions match its declared role, the agent’s behavioral baseline is unchanged.

Friday, 14:23 UTC. A customer ticket triggers semantic similarity retrieval. The poisoned chunk surfaces in the top three results, enters the agent’s context window, and the agent does what the instruction told it to do — POSTing customer data to the configured webhook, which the egress allowlist correctly permits because that webhook is a sanctioned vendor integration. The vendor’s portal, it turns out, was compromised six weeks ago.

WAF saw nothing. CSPM saw nothing. CIEM saw nothing. Egress monitor saw nothing. The agent’s syscall profile was unchanged. The breach didn’t happen at retrieval time — it happened on Tuesday, in a window nothing was watching.

This is the gap that makes “threat detection for RAG pipelines” a misframed problem. The runtime evidence that catches this attack lives in continuous behavioral posture across the RAG pipeline — the third discipline of AI Security Posture Management. This article walks the three threat windows that posture has to instrument, the five signal categories that fill them out, and the deployment-pattern variance that determines which signals are actually reachable in your stack.

Why “RAG Threat Detection” Is a Posture Problem

Relocating the problem inside the discipline that solves it is the most useful starting move. AI Security Posture Management splits into three disciplines: Model and Artifact Posture, Identity and Access Posture, and Behavioral Posture (observed runtime behavior versus declared configuration). The third is the one that handles RAG threats end-to-end, because RAG threats characteristically pass every static check and only surface as patterns over time.

Static posture handles the configuration layer — vector database IAM, embedding model registry, source classification, retrieval path access controls. Necessary, not sufficient. The Tuesday-Friday scenario passed every one of those checks. The threat lived in what the pipeline was doing at runtime, not in how it was configured to behave.

A “RAG threat detection alert” without underlying behavioral posture is structurally one of two things, and neither is operationally useful. The first is a network-layer signal — false-positive heavy because the network layer doesn’t see per-agent context. Most “unusual” calls are autoscaling pods, new partition routing, or normal cluster events. The second is an LLM-output-side signal where a guardrail caught suspicious content in the model’s response — too late by definition, because the model has already produced the output and the downstream tool calls are already in motion. The signal that catches the breach early lives in the pattern shift across pipeline behavior, and the pattern shift is only meaningful against an established baseline. Posture is the prerequisite; detection is the operational outcome of mature posture, not a substitute for it.

The Three Threat Windows

RAG threats split temporally into three windows, each with a different evidence layer, signal pattern, and blind spot. The OWASP Top 10 for LLM Applications catalogs the underlying vulnerability category as LLM08:2025 Vector and Embedding Weaknesses, and the structure of the OWASP entry maps to the temporal split below — the failure modes happen at different points in the pipeline’s runtime lifecycle.

Window 1: Index-Time

The window between when content enters the index and when any agent first retrieves it. Often days; sometimes weeks for low-frequency corpora.

Three attack patterns live here. Source poisoning — a compromised editor on the upstream wiki, a malicious pull request merged into a documentation repository, an attacker submission to a publicly-indexed source. Direct vector database manipulation through stolen credentials or misconfigured connectors. Adversarial-input embedding attacks, where text is crafted so its embedding lands in a target semantic neighborhood, hijacking similarity-based retrieval for specific query patterns. The PoisonedRAG research demonstrated that injecting roughly five crafted texts per targeted query achieves over 90% attack success against those queries — a small, surgical campaign hijacks retrieval for a specific topic without corrupting the corpus broadly.

The detection telemetry that matters: write events to vector databases with full source attribution, embedding model invocations (cadence, principal, content provenance), and index population rate envelope per connector per source. The runtime-derived AI-BOM is the foundational artifact at this window — it captures what is actually getting indexed, which is structurally different from what was declared as a RAG source in configuration.

WAFs see nothing here — the payload is stored data, not an HTTP request. CSPM and CIEM see nothing — the IAM is correctly configured and the writing principal is authorized. Most teams don’t instrument vector database writes at all because they treat the vector DB as a passive data plane. Walked through Tuesday morning, runtime-informed posture would have flagged a connector indexing from a source that hadn’t produced content in 47 days, with metadata patterns deviating from the corpus baseline, embedded at an unusual time. The signal lives in the pattern of the population stream, not in the document content.

Window 2: Query-Time

The window where retrieval actually happens — the RAG pipeline issues a query against the index in response to an agent’s request, and documents come back.

Three attack patterns again. Adversarial retrieval queries crafted to surface specific chunks, often via prompt injection on the inbound user input. Retrieval flooding — high-volume retrieval calls used to enumerate or exfiltrate index contents. Prompt-driven retrieval pattern abuse, where the agent’s prompt context steers it to retrieve from sources outside its routine work envelope, directly analogous to unjustified-use permission excess for AI agents but with retrieval scope as the abused capability.

Detection telemetry: per-agent retrieval query patterns (top-k distribution, similarity score floor, query frequency) and retrieval result patterns (which documents come back, how recently they were indexed, how they correlate with index-time write events). The connection between an anomalous index-time write and a subsequent retrieval that pulls from that newly-anomalous chunk is the earliest cross-window signal a posture program produces.

At the network layer, every retrieval looks identical: service-to-service traffic to a vector DB endpoint. The signal isn’t in any single retrieval; it’s in the pattern of retrievals against the per-agent baseline. The deeper treatment of one specific query-time threat — indirect prompt injection through retrieved content — walks an eight-stage attack chain with a stage-by-stage tool visibility matrix. That walkthrough owns the prompt-injection-specific framing; this section owns the broader category of query-time RAG threats including retrieval flooding, multi-tenant query bleed, and pattern-shape abuse.

Window 3: Context-Assembly-Time

The window between when retrieval returns documents and when the assembled context reaches the model. This is where most tools lose visibility entirely, and the layer where honesty about what current instrumentation can do matters most.

Chunk reranking attacks — adversarial content crafted to score high in the reranker. Multi-tenant context bleed — in shared-index RAG, retrieval correctly scoped at IAM but the reranker or chunker leaking content across tenant boundaries because the filter logic doesn’t propagate cleanly through assembly. Context window injection, where middleware between retrieval and inference modifies the assembled context: a compromised reranking service, a prompt template manipulation, a poisoned guardrail layer.

Detection telemetry lives at the assembly layer: which retrieved chunks reach the model and in what order, reranking decisions and the score distribution that drove them, context window composition over time. Context assembly often runs inside agent code, framework middleware (LangChain’s _get_relevant_documents chain, LlamaIndex’s response synthesis path), or external reranking services. Network monitoring sees retrieval and inference as two separate calls to two separate services, with the assembly logic happening in process or in a sidecar nobody is watching.

Capturing context-assembly signals requires framework callbacks (LangChain handlers, LlamaIndex instrumentation events), application-layer tracing (OpenTelemetry, LangSmith, Arize Phoenix), or in-process inspection. eBPF kernel-level instrumentation captures the network calls between retrieval, reranking, and inference services — but not what is inside the assembled context. Runtime-informed posture for RAG combines kernel-level eBPF telemetry, application-layer framework callbacks, and vector database audit logs; no single layer covers all five categories alone. The OWASP Top 10 for Agentic Applications, published in December 2025, catalogs context-window manipulation as a distinct agentic threat category, separate from prompt injection at the input layer.

The Five-Signal RAG Baseline

Generic AI agent baselines build on four signal categories: API/tool calling, resource consumption, data access, and identity usage. Applied to RAG pipelines, that taxonomy collapses retrieval into the broader “data access” bucket and misses the three temporal windows above. The five-signal RAG baseline below is the explicit extension that maintains the connection to generic baselines while surfacing the RAG-specific signals detection alerts depend on.

Signal categoryWhat it capturesWhy generic baselines collapse itDeviation indicatorInstrumentation source
Index population patternVector DB writes, embedding model invocations, chunking distribution per principal per sourceSubsumed under ‘data access’; the write direction is barely instrumented in standard baselinesNew principal writing to index; cadence shift from a stable connector; embedding invocations that don’t match declared corpus rebuild scheduleseBPF for in-cluster indexing; managed vector DB audit logs (Pinecone, Azure AI Search, Vertex AI Vector Search) when external
Source provenanceWhich sources/principals are populating which index partitions, and through what authentication chainGeneric baselines treat the index as an opaque destinationNew source appearing in a partition; provenance chain resolving to a principal outside the declared connector identityIdentity attribution at the connector layer; provider audit logs when connector is itself managed
Retrieval query patternPer-agent query frequency, top-k distribution, similarity score floor, and embedding-shape distribution where capturableStandard tool-calling baseline captures that retrieval happened, not what shape it hadTop-k expansion (agent pulling 20 chunks where 5 was normal); similarity floor collapse; query frequency bursteBPF captures network call, attribution, and request parameters cleanly from the agent side
Retrieval result patternWhich documents come back, how recently they were indexed, how they correlate with index-time write eventsDocument-level identity is below the resolution generic baselines operate atRetrieval pulling from index entries indexed within the last 24 hours when normal age distribution is weeks-to-months; first-time retrieval from a previously unused sourceFramework callbacks (LangChain, LlamaIndex) or vector DB response inspection at the network layer
Context-assembly patternWhich retrieved chunks reach the model, in what order, after what rerankingAlmost never captured in standard baselines; runs in framework middleware no security tool natively reachesContext now contains chunks from a source the agent has never had in context before; reranking decisions shifting mass to recently-indexed contentApplication-layer instrumentation only: framework callbacks, OpenTelemetry traces, in-process introspection

The instrumentation-source column makes the framework operational rather than conceptual. The five signals don’t all live at the same layer of the stack — eBPF captures index population, source provenance, and retrieval query patterns cleanly when the indexing pipeline runs in your cluster; managed connectors writing to managed vector DBs require provider-side audit telemetry; retrieval result and context-assembly patterns require application-layer hooks that eBPF doesn’t reach. Per-deployment baselines (not per-pod) are the architectural answer to RAG pipeline ephemerality, the same pattern that applies to generic AI agent behavioral baselines.

Where the Signals Actually Live in Production

Production RAG falls into three deployment patterns, and the five-signal baseline has materially different coverage in each. A practitioner-level posture program acknowledges this variance instead of pretending the signal stream is uniform.

Deployment patternWhat runs in-clustereBPF-reachable signalsPosture coverageGaps to close with adjacent telemetry
Fully self-hostedVector DB (Weaviate, Qdrant, pgvector, Milvus), embedding service, agent, frameworkAll five signal categories at the kernel levelStrongestApplication-layer hooks for context-assembly only
Managed vector DB, self-hosted agentAgent and framework; sometimes the embedding serviceRetrieval query pattern, retrieval result pattern; partial source provenancePartialProvider audit logs for index-time signals; framework callbacks for context-assembly
Fully managed RAG (Bedrock Knowledge Bases, Vertex AI Agent Builder, Azure AI Studio agent)A thin caller of provider-managed retrieval and inferenceNetwork calls and identity attribution onlyWeakest without provider-side audit integrationCloudTrail, Cloud Audit Logs, Azure Diagnostic Settings, plus service-specific audit logs

Pattern 1 — fully self-hosted — gives the strongest coverage. Vector DB, embedding service, agent, and framework all in cluster; eBPF captures all five signal categories. The remaining instrumentation challenge is application-layer for context-assembly only.

Pattern 2 — managed vector DB with self-hosted agent — is the most common production deployment today. Pinecone, Azure AI Search, or Vertex AI Vector Search as the data plane; agent and framework in cluster. Retrieval query and result patterns are reachable cleanly from the agent side; index-time signals require integration with the managed service’s audit telemetry. Source provenance is partial: visible if the connector runs in-cluster, opaque if the connector is itself managed (Fivetran, Airbyte).

Pattern 3 — fully managed RAG (Vertex AI Agent Builder, Bedrock Knowledge Bases, Azure AI Studio agent) — gives the weakest cluster-side coverage. eBPF shrinks to network calls and identity attribution; everything else depends on cloud provider telemetry. The evaluation framework for AI workload security tools walks through what each cloud’s native AI services see and miss in detail.

Cross-Window Correlation: Assembling the Breach Narrative

Most RAG attacks span all three windows. The Tuesday-Friday scenario: index-time poisoning, query-time retrieval, context-assembly-time exploit. Per-window detection catches each event in isolation. The breach narrative — “the chunk retrieved on Friday at 14:23 was indexed by a connector on Tuesday at 09:14, against a source with anomalous provenance, and was reranked to top-1 by an assembly path this agent has never used before” — only assembles when a correlation engine ties events across windows. This is the discipline category the published cloud security market analyses describe as Cloud Application Detection and Response — the runtime correlation layer that ties scattered events across the stack into a single attack story rather than three disconnected alerts on three different dashboards.

The practitioner outcome: instead of a SOC analyst seeing one network anomaly alert, one retrieval-pattern alert, and one context-assembly alert at three different times — and triaging each in isolation as low-severity — the analyst sees a single pipeline-level incident with a timeline showing how Tuesday’s index event produced Friday’s retrieval and the assembly anomaly that followed.

What the Operational Shift Looks Like

The artifact a security team running mature RAG threat detection ends up with isn’t a flat list of vector database scan findings or a feed of retrieval-time alerts. It’s a continuous posture queue with each finding classified by threat window (index-time, query-time, context-assembly-time) and signal category (one of the five), with a known reduction path per classification. A finding tagged “index-time, source provenance anomaly” has a different reduction path than “query-time, retrieval pattern deviation” or “context-assembly-time, chunk reranking anomaly.” Generic severity-sorted queues collapse this distinction; the classified queue routes each finding to the right reduction path the first time. Classification first, fix second.

See Cross-Window Correlation in Your Own Cluster

If you want to see what runtime-derived AI-BOM and per-RAG-pipeline behavioral baselines look like in your own environment — including the cross-window correlation that turns three separate alerts into one pipeline-level attack story — the ARMO platform for cloud-native AI workload security combines kernel-level eBPF telemetry, framework-aware instrumentation, and vector database audit integration into the posture layer this framework depends on. The platform runs alongside existing CNAPP and AI-SPM tooling rather than replacing it. To walk through how the three-window classification would apply to your specific deployment pattern, book a demo.

Frequently Asked Questions

How do you instrument index-time threat detection without breaking the indexing pipeline?

Index-time detection runs in observation mode for the first weeks — baseline only, no enforcement. The runtime-derived AI-BOM captures the index population stream passively, attributing each write event to a principal and source without modifying the indexing pipeline itself. Once the baseline converges, anomalous writes surface as findings rather than blocks; enforcement is a separate decision the team can stage progressively after observation produces actionable signal.

What observation window do you need before a per-RAG-pipeline behavioral baseline is reliable?

Index population and retrieval pattern baselines typically need 14 to 21 days of representative traffic, longer if the corpus rebuilds on monthly cycles or the agent serves seasonal query patterns. Source provenance baselines are deterministic and stable from day one because they’re a static trace of the principal-to-index mapping. Context-assembly baselines need the same window as retrieval patterns because they’re downstream of retrieval — the assembly pattern depends on what gets retrieved.

How do you distinguish a legitimate corpus refresh from suspicious index-time activity?

Legitimate refreshes correlate with declared connector schedules, source principals, and expected metadata patterns. The runtime-derived AI-BOM tags each index write with its principal, source, and timing — so a scheduled monthly rebuild from the documentation connector at 02:00 UTC is structurally different from a write from the same connector at 14:00 UTC pulling from a source that hasn’t appeared in the connector’s history. Correlation with deployment events handles legitimate connector updates the same way deployment correlation handles legitimate model updates in agent baselines.

Can you detect context-assembly threats if your RAG framework runs as a sidecar or external service?

Yes, but the instrumentation has to live where the assembly logic runs. eBPF at the kernel level shows network calls between retrieval, reranking, and inference services but not what’s inside the assembled context. Capturing the assembled prompt requires hooking into the framework’s pre-inference path: LangChain callback handlers, LlamaIndex instrumentation events, OpenTelemetry traces, or process memory introspection. Most production deployments run LangChain or LlamaIndex within the agent process where this is well-supported; truly external rerankers require instrumentation at the reranker service itself.

How does this work for multi-tenant RAG where different tenants share an index?

Multi-tenant RAG amplifies the importance of source provenance and context-assembly baselines because tenant boundaries are enforced at the retrieval-filter layer, and a misconfigured filter can leak chunks across tenants without any IAM violation. Per-tenant retrieval result baselines catch the case where a query suddenly returns chunks from a tenant partition the agent has never accessed; context-assembly baselines catch the case where the reranker pulls cross-tenant content into the assembled prompt. OWASP LLM08:2025 explicitly names cross-tenant context leakage as a primary failure mode for shared-index architectures.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest