Join the conversation on Kubescape’s Slack channels
From specific questions to random thoughts, whatever it is, you’re welcome to join the Kubescape...
Feb 2, 2021
As a product manager, I am always concern about the value my customers will get from the product, and this is my main focus. In order to achieve this, I often meet with customers and talk about pain points, problems, offer a solution, see how the product can help. In the past few years, one of the items that get raised in these discussions is not related to any pain point or feature requirement, it is the attachment method when dealing with K8s security.
Customers ask: “Are you using eBPF?”, “is your product agentless?”– while these are technological, classic “how” questions, I am not surprised the customers are asking them. We, the vendors, are publishing on our website: “Agentless”, “eBPF”, “nano-agent” etc.… like I always have the same answer for these types of questions, which I will try to share with you on this article.
TL; DR: there is no right and wrong! Each attachment option gives you different capabilities – there is no magic! For example, if you are agentless, you can’t really prevent anything (and transparent proxy vendors will excuse me as, in this article, I will focus on the attachment method in Kubernetes). Basically, there are the following attachment points (for a K8s security/visibility product):
That’s it! No other option exists! Putting a physical device outside of the cluster (Firewall, WAF, IPS…) does not really work in K8s as you are losing the context, which is the essence in security/visibility.
In the next section, I will describe each option and what are the pro/cons for each attachment option.
The API option is all about using the native capabilities ofK8s (in managed K8s cases like AKS and EKS, it also includes the cloud vendors’ APIs). You use the capabilities to get the audit logs, pods logs, K8s configuration etc. Later you apply some logic that will generate security value.
Is it seamless? Not sure, in most cases, the product must get the highest privileges access to your environment, which means it can do ANYTHING and even become an additional vulnerable point. Some vendors will argue that they “just” need read-only access. If this is the case, the product is a detection, recommendation-based product. It can alert if something is wrong, but you need to go the extra mile to remediate or analyze the issue. In addition, if someone gets your configuration, he will be able to find a way into your environment as this is the perfect intelligence for planning an attack. Lastly, you are bound to what the API provides. For example, if you use K8s API to generate a pod-pod access list – you can’t do basic things like blocking a connection or logging specific connections.
Having said the above, it is a very good option for trying products, and it will probably get the least objection from DevOps or any person who is managing the environment. If it does not provide the value, plug it out, and no harm is done.
Host agent is not a new idea; antivirus companies are using it for years (user mode and/or kernel mode). It is an agent, usually privileged, that you install on each worker node in your cluster. In K8s, it is called daemonset, and K8s did an amazing job enabling us to scale this agent.
The K8s community, in the past few months, is amplifying the use of a new “magic”. This magic is called eBPF. I personally got acquainted with this magic 2-3 years ago, when I was looking at how we can help customers solve the visibility challenges in K8s – knowing which pod communicated with which pod.
I cannot call myself an eBPF expert. I am far from it. When I was working for Check Point, we developed a Linux kernel driver, which was intended for fast packet processing. It was an “agent” on every host. The issue with this “agent” was that when you had the smallest problem –the machine would die. Back then, the only other option was to go to user mode. In user mode, if you have a bug, the “agent” dies, but the host is still functional. You can spin a new agent and start from where you stopped – worst case, you lost few connections. The issue was performance – kernel was way better in performance. Another important note is that kernel attachment is not really an option in the cloud, as in some clouds, you cannot change the Linux kernel.
eBPF, solves this problem. In eBPF, which stands for extended Berkeley Packet Filter, you write a small program in user space, it then runs in a sandbox in the kernel. You don’t need to install kernel modules; you just write code that runs in a sandbox and changes your kernel.
This sounds great. Can I order two? Putting the jokes aside, it is great. The only issue is that now you have an agent that has root privilege on every host in the environment. If someone breaks into the product, they are able to go ANYWHERE in your environment. If there is a bug in your program, it might take down the entire host. Another issue is that you must run it on the host, so what happens in cases where you use container-as-a service (CaaS) offerings like ACI or Fargate?
The idea in this attachment option is that something is running in the pod. There are two main options I want to cover:
In most cases, you don’t need to change your application, and the agent connects seamlessly to your environment (at least, this is the promise). The main benefit of this approach is that you get a lot of value without the need to rewrite your application. The value you get from it can vary from canary deployment to network security and other use cases. It leaves your development team focused on what they do best – developing a product for your customers and get added value from a 3rd party who is an expert in security. Another important point is that it should not require any excessive privileges, meaning it should inherit the same privilege level as your existing pod.
In today’s security and observability landscape, when the industry pushes for end-to-end traffic encryption, most of the existing solutions (chock points, proxy etc.) will become blind to the traffic. Opening the traffic for inspection or visibility with Man-in-the-Middle (MiTM)techniques is a huge performance penalty, which leads to the only option, which is to be in the application.
The issue with the container-based attachment point is that if you remove it, it might require some extra adjustment, meaning it is not part of your application, but your application requires it in order to work. This is no different than having a network firewall in your data center and taking it out. Once you take it out, you need to adjust routing.
Another concern that I often hear from customers is the added CPU, latency and memory overhead. They are afraid that adding this agent twill degradation the performance of the workload massively.
As the name implies, it is usually a container that runs in your pod. This container has some features and capabilities, some might be network security-oriented, but this is not mandatory.
As sidecar is a full container that is added to your application, and since the best practice is to run a single container in a pod, by adding the sidecar container, you are potentially increasing the resources required for your workload. In cloud environments, where you pay for the compute resources – it means more money!
Another problem is the performance and latency. While some claim it is small, the sidecar is acting as a transparent proxy, and even if it is super effective, it is still a proxy, a check point for every communication destined from and to the workload. If you add it to any workload, you pay for this latency twice for pod-to-pod communications. Sidecar handles network security problems, but it does not solve runtime Malware detection and prevention.
Lastly, most of the people I have talked to will add it to the entire application (namespace, cluster etc.). They did not find value in adding it to a few microservices out of the entire application.
The benefit of sidecar is that if it has a bug that makes it crash, it does not influence your workload. To be honest, it will not make your workload die, but it will not be able to operate.
This is an agent, usually a library (.SO in Linux and DLL in windows). Adding it to your workload should be easy as adding any other library to your workload. These micro-agents are usually relatively small in footprint (CPU and memory), and since they instrument operating system native capabilities, if done right, they should not add major CPU, memory usage and latency.
The main issue is when the library has a bug, it might break your workload. As seen before, this is not a special case – if the agent has issues, in all cases it will break your application. This is a major concern of most customers; they add a 3rd party piece of software that might break their application. I would argue that in the microservices, DevOps oriented landscape we live in, our development teams take so many libraries and code from unknown 3rd parties that the agent should be the least of our concerns. At the end of the day, there is a vendor whose business is dependent on the quality of the agent, and he will support the customers on every issue they might have.
There is no right and wrong when it comes to choosing the right attachment point. Leave it to the vendor to choose where to attach in order to solve your issues and give you added VALUE. At the end of the day, you are not looking for an attachment point; you are looking for someone to help you solve a headache.
Below is a summary table describing the pros/cons for each attachment point.
From specific questions to random thoughts, whatever it is, you’re welcome to join the Kubescape...
Former Google DevRel lead and co-host of the weekly Kubernetes podcast, Craig will lead ARMO’s...
ARMO’s Kubescape is an open, transparent, single pane of glass for Kubernetes security, used by...