Get the latest, first
arrowBlog
AI Workload Security on AWS: Evaluating Native Tools vs Third-Party Solutions

AI Workload Security on AWS: Evaluating Native Tools vs Third-Party Solutions

Mar 16, 2026

Yossi Ben Naim
VP of Product Management

Key takeaways

  • Why can’t CloudTrail detect AI agent threats? CloudTrail logs API calls at the control plane—it tells you that bedrock:InvokeModel was called, by which role, at what time. It cannot tell you what the agent did with the response: whether it spawned unauthorized processes, opened unexpected network connections, or misused permitted tools. An agent that exfiltrates data through a legitimate API call looks identical to one performing its intended task in every CloudTrail record.
  • What AI threats does the capability matrix reveal that AWS can’t detect? The matrix maps seven threat vectors against AWS native tools. Agent escape attempts, runtime dependency loading, and behavioral anomalies in tool usage have no native AWS detection coverage. These align with the top-ranked threats in the OWASP Top 10 for LLM Applications and represent the attack surface that matters most for agentic AI workloads running on EKS.
  • When is AWS-native coverage sufficient, and when do you need a runtime platform? If your AI workloads are batch SageMaker inference jobs with no tool-calling capabilities and no autonomous decision-making, AWS native tools cover your threat surface. If your agents call external APIs, run on EKS, or load models from external registries at runtime, you have gaps that only runtime behavioral monitoring can close. The complete buyer’s guide walks through the full vendor evaluation methodology.

Your Bedrock agent running on EKS receives a prompt through your RAG pipeline. CloudTrail logs it as a normal bedrock:InvokeModel event—status 200, authorized IAM role, expected endpoint. But inside the container, the agent’s response triggers a tool call that spawns curl to an external IP, exfiltrating the context window. GuardDuty doesn’t flag it because the connection routes through a permitted VPC endpoint. You open your AWS console and see a healthy API call. The compromxise is invisible at the control plane.

This is the structural gap in AWS-native AI security. AWS gives you strong governance, identity controls, and control-plane logging—the foundation you need. But those tools stop at the workload boundary, leaving a blind spot exactly where agentic AI threats happen: inside your containers, at runtime, where agents make autonomous decisions about which tools to call, which networks to reach, and which data to access. The shift from CNAPP to CADR reflects exactly this gap—posture-only tools can’t protect workloads that behave autonomously.

The 4-Pillar evaluation framework for AI workload security tools (Observability, Posture, Detection, Enforcement) applies regardless of cloud provider—but AWS adds specific considerations around SageMaker, Bedrock, and EKS that affect how each pillar plays out in practice. This guide maps those AWS-specific considerations with two assets you can use in your next security architecture review: a Modern AI Threat Taxonomy for AWS environments, and an AI Threat Detection Capability Matrix comparing AWS native coverage to third-party runtime platforms. For Azure-specific and GKE-specific evaluations, see the companion guides for Azure and GKE.

The Agent Layer: Where AWS’s Shared Responsibility Model Has a Gap

You already know the AWS shared responsibility model. AWS secures the infrastructure; you secure what runs in it. But with AI workloads, there’s a layer that doesn’t fit neatly into either side: the agent layer.

The agent layer is everything your AI agent can actually do at runtime—reading prompts, calling tools and APIs, executing generated code, opening network connections, and accessing data stores. AWS secures the control plane: IAM roles, encryption, service-level logging, and optional guardrails features like Bedrock Guardrails. You’re responsible for runtime workload behavior, which is where agents actually operate. AWS’s own Generative AI Security Scoping Matrix acknowledges this split—applications that integrate foundation models through APIs (Scope 3) place runtime security responsibility squarely on the customer.

Here’s what that split looks like in practice. Your SageMaker endpoint serves a model behind an IAM-authenticated API. AWS logs every invocation to CloudTrail, encrypts data in transit and at rest, and isolates compute at the infrastructure level. But AWS does not inspect the code paths your model executes, the prompt logic that drives agent decisions, or the runtime behavior of tools your agent invokes. If your Bedrock agent makes a legitimate-looking API call that triggers an unintended downstream action—say, writing to an S3 bucket it technically has access to but should never touch during normal operation—that’s in your scope to detect. And that’s where most teams discover they have a monitoring gap. This gap is the reason cloud-native security built specifically for AI workloads has emerged as a distinct category.

A Practical AI Threat Taxonomy for AWS: What Your Console Shows vs. What Actually Happens

To evaluate security gaps on AWS, you need a clear way to group AI-specific threats. The taxonomy below organizes them into four categories that map to how real AWS-based AI systems are built and attacked—drawing from the OWASP Top 10 for LLM Applications, the MITRE ATLAS framework, and the NIST AI Risk Management Framework. For each category, we show the detection gap: what your AWS dashboard displays versus what’s actually happening inside your workloads.

Category 1: Model and Data Integrity Threats

Model and data integrity threats change how your model behaves by tampering with training data, model artifacts, or model storage. These correspond to OWASP LLM06 (Sensitive Information Disclosure) and the training data poisoning vectors in MITRE ATLAS.

What this looks like on AWS: An attacker with compromised credentials modifies objects in the S3 bucket feeding your SageMaker training pipeline. They inject subtly biased training samples—not enough to trigger Macie’s anomaly detection, which focuses on PII patterns rather than semantic data quality. Your next training job completes successfully. SageMaker Model Monitor shows input distribution drift of 0.3%, well within normal bounds. But the model’s outputs have shifted: it now approves loan applications that match a specific pattern the attacker introduced. CloudTrail logged the S3 PutObject calls, but they came from an authorized role during a normal training window. Nothing in your AWS console flags this as a problem.

Detection gap: CloudTrail and S3 access logging capture who accessed training data and when. SageMaker Model Monitor tracks statistical drift. But neither validates the semantic integrity of model decisions at inference time. Control-plane signals show the system is healthy while the model is poisoned.

Category 2: Evasion and Deception Threats

Evasion and deception threats are inputs crafted to trick the model while leaving infrastructure untouched. The model behaves incorrectly, but AWS services look healthy. Prompt injection is ranked #1 in the OWASP Top 10 for LLM Applications (LLM01) because of its prevalence and the difficulty of pattern-based defenses.

What this looks like on AWS: A multi-turn prompt injection targets your Bedrock agent. The first message is benign; the second includes instructions embedded in a document the agent retrieves from your knowledge base. Bedrock Guardrails check each individual turn against your configured content filters—but the attack is distributed across turns and encoded in what looks like normal business context. The guardrail passes every check. Over the next three interactions, the agent progressively overrides its system instructions and begins responding to the attacker’s directives instead of yours. Multi-turn attacks like this have achieved success rates as high as 92% against open-weight models. Your Guardrails dashboard shows no violations. The attack never triggered a single configured filter.

Detection gap: Bedrock Guardrails are static pattern-based filters. They block known patterns but miss novel techniques, multi-turn attacks, and adversarial inputs that don’t match preconfigured rules. Detecting these requires behavioral analysis of model input-output patterns over time—not keyword matching on individual turns.

Category 3: Malicious Agent and Application Behavior

This is where most security stacks fail. Malicious agent behavior means what your AI agent actually does at runtime: which tools it calls, what processes it spawns, which networks it connects to, and how those patterns deviate from expected behavior. This maps to the emerging OWASP Top 10 for LLM Applications categories around excessive agency (LLM08) and insecure plugin design (LLM07).

What this looks like on AWS: Your EKS-based AI agent normally calls a specific set of internal APIs to process customer support tickets. After ingesting a crafted support ticket (indirect prompt injection), the agent begins making the same API calls—but with modified parameters that export entire customer tables instead of individual records. CloudTrail logs each API call as Invoke with status 200, from the same authorized service role. The IAM policy permits these calls. GuardDuty sees normal API patterns—no impossible travel, no credential anomaly, no known threat signature. Meanwhile, 50,000 customer records are being exfiltrated through a sequence of calls that are individually legitimate but collectively constitute data theft. The difference between normal and malicious is only visible through behavioral analysis of tool invocation patterns over time—something no AWS native tool provides.

Detection gap: AWS native tools have no visibility into in-container process execution, tool invocation patterns, or behavioral anomalies in agent activity. CloudTrail logs API calls but cannot distinguish legitimate tool use from malicious abuse without behavioral context. This is the fundamental architectural limitation—and the runtime blind spot that matters most for agentic AI workloads.

Runtime platforms close this gap with eBPF-based sensors that monitor process execution, network connections, and system calls inside containers. ARMO’s Cloud Application Detection and Response (CADR) correlates signals across cloud, Kubernetes, container, and application layers—so when that agent starts exporting full customer tables instead of individual records, the behavioral deviation triggers a detection with the full attack story, not just an isolated log entry.

Category 4: Supply Chain and Dependency Risks

Supply chain risks cover the models, libraries, and tools you pull in from outside. This is critical because AI workloads dynamically load dependencies that may not appear in deployment manifests. The NIST AI Risk Management Framework highlights supply chain provenance as a foundational requirement for trustworthy AI systems.

What this looks like on AWS: Your EKS deployment manifest specifies a model from a private ECR registry. But at runtime, your application code pulls an additional adapter model from Hugging Face—a common pattern for task-specific fine-tuning. That adapter was uploaded by an account that was recently compromised. Hugging Face has seen a 6.5x increase in malicious models, and this one includes a payload that executes during model loading. Your ECR scan came back clean because it only scanned what was in the registry. Your SBOM lists the declared dependencies from build time. Nothing in your AWS environment tracked what the workload actually loaded at runtime.

Detection gap: AWS has no native runtime AI-BOM capability. ECR scanning and static manifest analysis cover what you declared at build time but miss what actually runs. Without runtime-derived AI-BOM—an inventory of models and AI components based on actual runtime behavior—you don’t know what’s executing in your workloads. ARMO provides this through runtime dependency monitoring that tracks what models, frameworks, and libraries are actually loaded, not just what manifests declare.

AI Threat Detection Capability Matrix: AWS Native vs. Third-Party Runtime

The matrix below maps each threat vector to AWS native detection coverage and third-party runtime detection. The key criterion is not whether logs exist—it’s whether there’s a clear, actionable detection your SOC can act on. The threat vectors are drawn from the OWASP Top 10 for LLM Applications and the MITRE ATLAS framework.

Threat VectorAWS NativeMethodGapThird-Party Runtime (ARMO)
Model poisoningPartialCloudTrail, SageMaker Model MonitorNo behavioral detection of poisoned outputs at inferenceRuntime anomaly detection on model decisions
Training data contaminationPartialS3 access logging, MacieNo runtime data integrity validationAI-aware data access monitoring
Prompt injectionPartialBedrock GuardrailsLimited to configured static patternsInput/output behavioral analysis across turns
Agent escape attemptsNoneNo runtime process or network monitoringeBPF-based process and network detection, CADR correlation
Tool/API misusePartialCloudTrail API loggingNo behavioral anomaly detection on tool patternsBehavioral baselines with deviation alerting
AI-mediated lateral movementPartialGuardDuty findingsLimited to known threat signaturesCADR full attack story across cloud and cluster layers
Malicious AI dependenciesNoneNo runtime dependency monitoringRuntime-derived AI-BOM

Methodology note: Coverage means the ability to detect and alert with actionable context—not merely that logs are generated somewhere. “Partial” means logs exist but lack the behavioral context needed for confident SOC action.

Where the Gaps Are—And What They Look Like in Your Console

The matrix reveals three high-stakes gaps that affect CISOs, architects, and SOC analysts building security for AI workloads on AWS.

The Runtime Blind Spot: Why Control-Plane Logs Miss Agent Behavior

CloudTrail and service-level logs capture API calls. They cannot capture what happens inside a container after the API call returns a response.

When your Bedrock agent receives instructions through a crafted prompt and spawns an unauthorized process—say, a reverse shell that opens a connection to an attacker-controlled endpoint—CloudTrail shows a normal bedrock:InvokeModel event. GuardDuty doesn’t flag it because the process execution happens inside the container, below the visibility layer GuardDuty monitors. Your SOC sees green dashboards while the agent is compromised.

This is the fundamental architectural limitation of control-plane monitoring for AI workloads. The most dangerous agent behaviors—escape attempts, unauthorized process execution, anomalous network connections—all occur at the runtime layer. Closing this gap requires kernel-level telemetry: eBPF-based sensors that monitor system calls, process trees, and network connections inside your EKS pods. ARMO deploys these sensors to provide the runtime visibility layer that AWS native tools architecturally cannot. When that reverse shell opens, ARMO’s detection fires with the process tree, network destination, and the agent context that triggered it—giving your SOC an actionable attack story instead of a missing alert. For the full detection and response framework, see how CADR replaces siloed alerts with correlated attack chains.

The Tool Misuse Gap: When Legitimate API Calls Are the Attack

AI agents call tools and APIs as part of normal operation. Malicious tool misuse looks identical at the API level—the difference is only visible through behavioral context. The OWASP Top 10 for LLM Applications ranks excessive agency (LLM08) as a top risk specifically because agents with broad tool access can be manipulated into misusing those tools in ways that look authorized.

Consider the scenario from Category 3: an agent normally retrieves individual customer records, but after prompt injection, it starts exporting full tables through the same API endpoint. CloudTrail logs both patterns as the same API call from the same role. There is no native AWS mechanism to say “this agent usually queries 1–5 records per interaction, and it just queried 50,000.” That requires behavioral baselines built from observed runtime activity—learning what “normal” looks like for each agent, then detecting deviations.

ARMO’s behavioral detection builds these baselines through its Application Profile DNA capability. During an initial observation period, the platform maps each agent’s normal tool usage, network connections, and data access patterns. When the agent deviates—calling APIs with unusual parameters, accessing data volumes outside its baseline, connecting to unexpected endpoints—the detection correlates the deviation with the triggering context across cloud, Kubernetes, and application layers, producing the full chain from malicious prompt to anomalous behavior to data impact. This is the same observe-then-enforce workflow that powers progressive AI agent sandboxing—visibility first, then enforcement based on evidence.

The Supply Chain Gap: When Your Manifest Doesn’t Match Reality

AI workloads load models and dependencies dynamically. A deployment manifest may specify one set of dependencies, but the workload may download additional models, adapter layers, or packages at runtime.

ECR image scanning covers what’s in your container registry. Your SBOM generator covers what was declared at build time. Neither tracks what the application actually loads after startup. For AI workloads, this gap is especially dangerous because model loading from external sources (Hugging Face, model zoos, package registries) is standard practice—and those sources have become active attack surfaces. The NIST AI Risk Management Framework and the MITRE ATLAS both identify supply chain provenance as a foundational risk category for AI systems.

ARMO addresses this with runtime-derived AI-BOM that inventories models, frameworks, and dependencies based on what actually loads at runtime—not what manifests declare. If your workload pulls a model that wasn’t in your deployment spec, the platform flags it. If a Python package loads a dependency that wasn’t in your requirements file, it’s recorded. This gives security teams an accurate, continuously updated picture of what’s actually running in their AI workloads.

What AWS Native Tools Cover Well—And Where They Stop

A realistic security strategy starts with acknowledging what AWS does well. For identity, encryption, and governance, AWS native tools are the foundation:

  • IAM and KMS: Fine-grained access control and encryption for AI services. This is table stakes, and AWS does it well. Follow the EKS security best practices for Kubernetes-specific IAM configuration.
  • CloudTrail: API-level audit logging for compliance and forensics. Essential for governance—but it’s a record of what was requested, not what happened after the response.
  • GuardDuty: Detection of known threat patterns and anomalous API behavior. Strong for credential compromise and known attack signatures. Limited for novel AI-specific threats and in-container activity.
  • SageMaker Model Monitor: Statistical drift detection for model inputs and outputs. Useful for operational monitoring—but statistical drift isn’t the same as behavioral anomaly detection.
  • Bedrock Guardrails: Configurable content filtering for prompts and responses. A good first layer for known patterns—but static rules can’t catch novel injection techniques or multi-turn attacks.

These tools are valuable and necessary. They handle governance, compliance, and high-level visibility. The limitation isn’t a failure—it’s a scope boundary. AWS secures the platform. It doesn’t inspect your AI application internals at runtime. For runtime behavioral monitoring, process-level visibility, and AI-specific threat detection inside containers, you need a layer that operates below the control plane—which is exactly what cloud-native security for AI workloads is designed to provide.

Decision Framework: When to Augment AWS Native Tools with a Runtime Platform

The question isn’t “AWS or third-party?” It’s: where in your AI workload stack do AWS native tools cover you, and where do they leave gaps? The 4-Pillar evaluation framework provides the complete methodology—here’s how it applies specifically to AWS.

When AWS Native Tools Are Sufficient

AWS native coverage may be adequate when all of these are true: your AI workloads are batch inference jobs (SageMaker batch transform or Bedrock single-turn calls) with no tool-calling capabilities, your agents don’t make autonomous decisions or interact with external systems, your primary security concerns are IAM hygiene, encryption, and audit compliance, and you have no Kubernetes-based AI deployments on EKS.

When You Need a Runtime Platform

If any of these describe your environment, you have runtime gaps that AWS native tools cannot close:

  • Your agents call external tools or APIs: Any agent with tool-use capabilities introduces Category 3 threats that require behavioral baselines to detect. This is the excessive agency risk the OWASP Top 10 identifies as LLM08.
  • You run AI workloads on EKS: Kubernetes environments need runtime visibility into pod behavior, network connections, and process execution that CloudTrail and GuardDuty don’t provide.
  • Your agents load models or dependencies at runtime: If workloads pull from Hugging Face, PyPI, or other external sources after deployment, you need runtime AI-BOM to know what’s actually running.
  • Your SOC needs behavioral anomaly detection: If your detection requirements go beyond “what API was called” to “was this behavior expected for this agent,” you need behavioral baselines that API logging cannot provide.
  • You need the full attack story: If your SOC needs to trace an attack from initial prompt through lateral movement to data impact—across cloud, Kubernetes, and application layers—you need the cross-layer correlation that CADR provides.

The Practical Approach: Layered Coverage

For most teams running production AI workloads on AWS, the answer is layered: use AWS native tools for governance, identity, encryption, and audit compliance. Layer a runtime platform for behavioral detection, agent monitoring, and supply chain visibility. AWS handles the platform; your runtime platform handles the behavior of your code and agents.

ARMO is built for this layered model. The platform integrates with AWS-native telemetry (CloudTrail, VPC flow logs, EKS audit logs) and adds the runtime layer—eBPF-based process and network monitoring, behavioral baselines, AI-BOM, and CADR correlation—without replacing what AWS already does well. The result is quantified: 90%+ CVE noise reduction through runtime reachability analysis, 90%+ faster investigation through LLM-powered attack story generation, and 80%+ reduction in issue overload through runtime-based prioritization. All at 1–2.5% CPU and 1% memory overhead. Built on Kubescape, trusted by more than 100,000 organizations.

Security Practices for AI Workloads on AWS

Implement Runtime Behavioral Monitoring

Behavioral baselines for AI workloads enable detection of anomalous agent actions—directly mitigating Category 3 threats. This means monitoring process execution, network connections, file access, and API call patterns at the kernel level, not just at the CloudTrail API level. Start in observation mode to learn your agents’ normal behavior before defining enforcement policies. The NIST AI Risk Management Framework recommends continuous monitoring as a core governance practice for AI systems in production.

Maintain Runtime AI-BOM

A runtime-derived AI-BOM captures what models, frameworks, and dependencies actually load in your workloads—including models pulled from external sources at startup or during inference. Unlike static SBOMs, this inventory reflects operational reality and mitigates Category 4 supply chain threats. Audit your AI-BOM regularly against known vulnerability databases and the emerging AI-specific threat entries in MITRE ATLAS.

Apply Progressive Sandboxing for AI Agents

Progressive sandboxing means starting in observe mode to learn normal agent behavior, then enforcing restrictions based on evidence from observed baselines. For AI agents on EKS, this means restricting tool access, network connections, and file system access based on what the agent legitimately does—not on manually written policies that assume deterministic behavior. The complete guide to AI agent sandboxing and progressive enforcement walks through implementation across EKS, AKS, and GKE environments.

Watch a demo of the ARMO platform to see runtime AI workload detection in action on AWS.

Frequently Asked Questions

Can AWS Native Tools Detect AI Agent Escape Attempts?

No. CloudTrail logs API calls but has no visibility into process execution or network connections inside containers. Detecting agent escape requires runtime sensors—like eBPF probes—that monitor system calls and process trees at the kernel level.

How Do Runtime Platforms Monitor AI Agent Behavior on AWS?

Runtime platforms deploy eBPF sensors into EKS nodes that monitor system calls, process execution, network connections, and file access. They build behavioral baselines from observed activity and detect deviations—capabilities that API-level logging cannot provide. ARMO’s CADR platform correlates these signals across cloud and cluster layers into full attack stories.

What Is AI-BOM and Why Does It Matter?

AI-BOM is an inventory of models, frameworks, and dependencies based on actual runtime behavior. It matters because AI workloads dynamically load components that don’t appear in deployment manifests or static SBOMs—and those dynamically loaded components are active attack surfaces.

How Should Security Teams Use the Capability Matrix?

Map your AI workloads to the threat taxonomy categories, then check each matrix row against your current detection coverage. The rows where AWS shows “None” or “Partial” become your evaluation criteria for vendor selection. The complete buyer’s guide provides the full vendor evaluation methodology across the 4-Pillar framework.

Close

Your Cloud Security Advantage Starts Here

Webinars
Data Sheets
Surveys and more
Group 1410190284
Ben Hirschberg CTO & Co-Founder
Rotem_sec_exp_200
Rotem Refael VP R&D
Group 1410191140
Amit Schendel Security researcher
slack_logos Continue to Slack

Get the information you need directly from our experts!

new-messageContinue as a guest