The Kubernetes network (security) effect
Apr 20, 2021
Around 20 years ago I had the privilege of joining a young company that invented the Firewall – Check Point. I learned most of my networking knowledge and skills at Check Point and, at that time, I was involved in the high end, rapidly evolving internet. This might be the reason why I truly believe that network security must be a layer in the overall security strategy. A few years ago, I came back to Check Point as a cloud security product manager. I witnessed the evolution of cloud and the network security challenges in this domain. In this article, I would like to share with you my journey and insights about Kubernetes (K8s) network security. TL;DR: K8s has a built-in object (sort of) for managing network security(NetworkPolicy). While it allows the user to define the relationship between pods with ingress and egress policies, it’s still quite basic and requires very precise IP mapping - for a solution that’s constantly changing, so most users I’ve talked to are not using it. Users need a simplified solution that protects their K8s networks to the max - a solution that makes network security in K8s a do-able task with the highest security level possible: mutual TLS between microservices and real zero trust deployment. And no, this is not a side car solution….
Still stuck with Firewall?Back in the days a network security policy was defined with IP addresses and subnets. You would define the source and destination, then the destination port, action, and track options. Over the years, the firewall evolved and became application-aware with added capabilities for advanced malware prevention and more. It is no longer afirewall, but a full network security solution! However, most network security solutions, even today, use IP addresses and ranges as the source and destination. This was the first challenge when these devices moved to the cloud. How can you define source/destination IP in such a rapidly changing environment where IP addresses change all the time: an IP is assigned to a database workload and the next minute it is assigned to the web workload. In addition, if you want to understand the cloud and see the connections prior to network address translation you must be inside the application - in K8s in most cases, when a pod connects to an external resource, it goes through Network Address Translation, meaning that the destination sees the source IP as the worker node address and not the pod. For Infrastructure as-a-Service (IaaS) cloud deployments, most companies can solve this challenge by installing their network security solution with a proxy on a virtual machine(VM). But when it comes to Kubernetes – it is just not working though. Why? ● Anormal pod in K8s is just a few MBs - in other words, you cannot deploy a full flagged network security solution in a pod.Placing it outside of K8s solves the North- South hygiene to some extent(traffic in and out of the network), but not the East-West (traffic within the network and in-cluster connectivity). ● K8s is the cloud on steroids - pods scale up and down rapidly. IP assignment changes and the rules cannot be bound to IP addresses and subnets. ● A fully flagged network security is not required. For example: there is no requirement to do deep packet inspection inside K8s. Most companies are looking for East-West micro-segmentation – basically firewalling. Lucky for us, K8s was created with the NetworkPolicy object. This object treats each pod as a permitter on its own, and you can define Ingress policy and Egress policy. Both policies can leverage IP addresses, subnets (CIDR) and labels. Unfortunately, K8s does not support FQDN (Fully qualified domain name) in the native security policy. This means that it’s impossible to create a policy that limits the access to S3 or Twitter (for example). Network security is enforced by the network layer (this topic is not in the scope of this blog), the most common layers are Calico, Flannel and Cilium. By design, the K8s network is flat. One microservice from one namespace can connect to another microservice even if it is in another namespace.
Struggling with building a K8s network policy that worksLogically, you would expect users to use network policies, but most are not using it. Creating a network policy is an iterative task:
- Map the communication between different elements, the resources that access the application, the resources the application connects to, the ports and the protocols.
- Create a policy.
- Run the application and watch to see if everything works.
- Find & fix the things you missed.
- Repeat every time your network or an application changes.
Istio to the rescueA service mesh is a way to control how different parts of an application share data with one another. Unlike other systems for managing this communication, a service mesh is a dedicated infrastructure layer built right into an application. This visible infrastructure layer can document how well (or not) different parts of an application interact, so it becomes easier to optimize communication and avoid downtime as an app grows. Istio is the most popular service mesh solution available today. To overcome the design issues we’ve discussed so far, Istio adds a sidecar container in order to identify individual workloads and moves the east/west traffic to mTLS. Now if pod A connects to pod B, pod A and B will communicate by first authenticating their certificates. Malicious attackers will have no way to intercept and decode the traffic. This is great! But the fact is that most organizations are still not using Istio either! In fact, the last CNCF report from late 2020 indicates that only 30% of K8s users are using a service mesh (Istio or otherwise). This is probably because Istio is VERY complex, and it has a performance penalty and latency. Not only that, but it suffers from the same identity problem as described above, meaning that if a malicious actor enters pod A and creates a connection to pod B, it will still be allowed access as long as the Istio policy allows for such. Figure 2: Istio 1.9 In the above diagram, each pod has an Envoy - a proxy that secures the communication from the original container by using a mutual TLS tunnel. It can be seen that the proxy (Envoy) does not care about the identity of the container. It can be a malicious container that communicates with other services and is awarded the identity that Istio/Envoy provides.
K8s network security best practicesWhile the challenges described above are quite limiting, there is still much that can be done. A K8s network security solution should follow these guidelines: ● Enforce “Zero trust” – Each microservice acts as its own permitter, as such it is recommended to follow the zero-trust model: do not trust and always verify! Where each request is being authenticated and authorized before the access is approved. ● Upgrade to Mutual TLS – it is recommended to use mutual TLS in order to encrypt the communication between the different microservices. This will assure that even if an attacker is present on the host, he cannot intercept and decode the traffic. ● Provide Network visibility - you cannot really protect something that you cannot see. Visibility is the key for understanding the communication patterns, not only of what is working, but also of what is not working, get dropped etc. ● Apply robust policy to meet rapid changes - When it comes to policy language., use a language that handles the constant changes in microservices. In most cases the change will be inside of the cluster. Meaning that once you set the ingress/egress traffic from/to the cluster, most of the changes will happen in the communication between the microservices.
Introducing ARMOARMO is a K8s security fabric that seamlessly infuses security, visibility and control into any workload -from build to runtime. ARMO developed an innovative approach to network security that eliminates the need to chase after constant policy adjustments created by the deployment of new pods. It solves the identity problem described above by creating a mutual TLS between pods and providing a network visibility graph. How do we do this? - ArmoGuard is assigned to the allowed processes in a pod - Each process gets a unique code-DNA - Based on its code-DNA, ArmoGuardTM distributes and manages TLS certificates to each pod
How is the policy created?ARMO provides an out-of-the-box Zero Trust Policy, we call it “baseline” policy. A Zero Trust policy is the perfect solution for configuration-free environments. ARMO enables all the non-compromised workloads to communicate freely as long as they have the same label or attribute defined in the policy. In order to achieve that, the baseline policy is set to a permissive mode, meaning that the ArmoGuardTM accepts all the clear, unidentified connections. Once all the workloads are assigned with ArmoGuard and the communication patterns are known, you can move to an Explicit policy and remove the permissive option, thus allowing only identified workloads to communicate. This means that if an attacker tries to run an unapproved process in a pod and communicate with other pods, it gets blocked. By default, all the communication channels are converted to mutual TLS without needing to define a Certificate Authority (CA) and assign keys to each pod. Using ARMO’s patented technology private keys are protected in-memory. In addition, ARMO can create a communication channel between hybrid cloud environments. It enables you to place ArmoGuard on pods or workloads in different cloud environments and allow them to communicate using the identity and over mutual TLS.