Stay up to date
The Kubernetes network (security) effect

The Kubernetes network (security) effect

Apr 20, 2021

Amir Kaushansky
VP Product

Around 20 years ago I had the privilege of joining a young company that invented the Firewall – Check Point. I learned most of my networking knowledge and skills at Check Point and, at that time, I was involved in the high end, rapidly evolving internet.

This might be the reason why I truly believe that network security must be a layer in the overall security strategy.

A few years ago, I came back to Check Point as a cloud security product manager. I witnessed the evolution of cloud and the network security challenges in this domain. In this article, I would like to share with you my journey and insights about Kubernetes (K8s) network security.

TL;DR: K8s has a built-in object (sort of) for managing network security(NetworkPolicy). While it allows the user to define the relationship between pods with ingress and egress policies, it’s still quite basic and requires very precise IP mapping – for a solution that’s constantly changing, so most users I’ve talked to are not using it.

Users need a simplified solution that protects their K8s networks to the max – a solution that makes network security in K8s a do-able task with the highest security level possible: mutual TLS between microservices and real zero trust deployment. And no, this is not a side car solution….

Still stuck with Firewall?

Back in the days a network security policy was defined with IP addresses and subnets. You would define the source and destination, then the destination port, action, and track options. Over the years, the firewall evolved and became application-aware with added capabilities for advanced malware prevention and more. It is no longer afirewall, but a full network security solution!

However, most network security solutions, even today, use IP addresses and ranges as the source and destination.  This was the first challenge when these devices moved to the cloud. How can you define source/destination IP in such a rapidly changing environment where IP addresses change all the time: an IP is assigned to a database workload and the next minute it is assigned to the web workload. In addition, if you want to understand the cloud and see the connections prior to network address translation you must be inside the application – in K8s in most cases, when a pod connects to an external resource, it goes through Network Address Translation, meaning that the destination sees the source IP as the worker node address and not the pod.

For Infrastructure as-a-Service (IaaS) cloud deployments, most companies can solve this challenge by installing their network security solution with a proxy on a virtual machine(VM).

But when it comes to Kubernetes – it is just not working though.

Why?

● Anormal pod in K8s is just a few MBs – in other words, you cannot deploy a full flagged network security solution in a pod.Placing it outside of K8s solves the North- South hygiene to some extent(traffic in and out of the network), but not the East-West (traffic within the network and in-cluster connectivity).

● K8s is the cloud on steroids – pods scale up and down rapidly. IP assignment changes and the rules cannot be bound to IP addresses and subnets.

● A fully flagged network security is not required. For example: there is no requirement to do deep packet inspection inside K8s.  Most companies are looking for East-West micro-segmentation – basically firewalling.

Lucky for us, K8s was created with the NetworkPolicy object. This object treats each pod as a permitter on its own, and you can define Ingress policy and Egress policy. Both policies can leverage IP addresses, subnets (CIDR) and labels. Unfortunately, K8s does not support FQDN (Fully qualified domain name) in the native security policy. This means that it’s impossible to create a policy that limits the access to S3 or Twitter (for example).

Network security is enforced by the network layer (this topic is not in the scope of this blog), the most common layers are Calico, Flannel and Cilium. By design, the K8s network is flat. One microservice from one namespace can connect to another microservice even if it is in another namespace.

Struggling with building a K8s network policy that works

Logically, you would expect users to use network policies, but most are not using them.

Creating a network policy is an iterative task:

  1. Map the communication between different elements, the resources that access the application, the resources the application connects to, the ports and the protocols.
  2. Create a policy.
  3. Run the application and watch to see if everything works.
  4. Find & fix the things you missed.
  5. Repeat every time your network or an application changes.

The catch is that in K8s your application, which is composed of pods (microservices), can change on a daily basis! And there’s no way you can really keep the same pace as your development team – updating the network policy every single time they push changes to an application.

Imagine you map all the communication patterns, create a network policy accordingly and everything works. A few hours later, a developer pushes a new version of a microservice that uses an API from a different pod and stopped communicating with the existing pod and with an external web site. Since you forgot to update the network policy, your new microservice stops working. You cannot debug what is wrong because the re are no network logs in K8.. Not only that, but even if you do succeed in fixing the issue, you might still prefer to keep the old policy that allows the new pod to communicate with the pod it stopped communicating with – making the network policy incorrect and missing the micro-segmentation goals.

Finally, K8s does not have a built-in capability for visualizing network traffic, so if you break a connection between two microservices – good luck in debugging it!

K8s network policy is configured by allowing rather than blocking. This means that if you want to block individual objects from a specific destination – you need to choose a different solution.

Lastly, the most annoying part is that the K8s network policy is set in such a way that if pod A and B need to communicate, you need to define egress traffic for pod A and ingress traffic for pod B. This is prone to errors and incredibly challenging to debug.

Figure 1: Network Policy

In the above example we show a native K8s policy for a pod which is labeled as “C”. The policy configures the objects so that pod C:

● Connects to pod “A” on port 443 and 80

● Initializes traffic to pod “B” on port 443 and 80

● Initializes traffic to 10.128.0.1/24

Most organizations just do North-South network security (outside of the cluster) and pray that nothing will break this security control.

By design, K8s security also suffers from the following issues:

–       The Identity problem – if pod A, with two containers, connects to pod B, pod B sees the incoming connection from pod A, however, it does not know which container created this connection. This means there is no way to implement security guardrails granularly enough on the pod level. As a result, if a malicious software is running in my pod, it will be able to communicate with other pods.

–       Clear connections – all the connections in K8s are based on the application/developer programming. Meaning that if the app uses protocols that aren’t encrypted, an attacker can intercept and decode the communication (as of today, most of the in-cluster communication is not encrypted).

This figure demonstrates the intention versus the actual flow. The network administrator set the policy to allow web to database connections. His intent was to allow NGNIX, running in the web pod, to communicate with the SQL server. However, this also means that malware running in one of the web pods can communicate with the SQL server.

Istio to the rescue

A service mesh is a way to control how different parts of an application share data with one another. Unlike other systems for managing this communication, a service mesh is a dedicated infrastructure layer built right into an application. This visible infrastructure layer can document how well (or not) different parts of an application interact, so it becomes easier to optimize communication and avoid downtime as an app grows.

Istio is the most popular service mesh solution available today.

To overcome the design issues we’ve discussed so far, Istio adds a sidecar container in order to identify individual workloads and moves the east/west traffic to mTLS. Now if pod A connects to pod B, pod A and B will communicate by first authenticating their certificates. Malicious attackers will have no way to intercept and decode the traffic.

This is great! But the fact is that most organizations are still not using Istio either! In fact, the last CNCF report from late 2020 indicates that only 30% of K8s users are using a service mesh (Istio or otherwise).  This is probably because Istio is VERY complex, and it has a performance penalty and latency.

Not only that, but it suffers from the same identity problem as described above, meaning that if a malicious actor enters pod A and creates a connection to pod B, it will still be allowed access as long as the Istio policy allows for such.

Figure 2: Istio 1.9

In the above diagram, each pod has an Envoy – a proxy that secures the communication from the original container by using a mutual TLS tunnel. It can be seen that the proxy (Envoy) does not care about the identity of the container. It can be a malicious container that communicates with other services and is awarded the identity that Istio/Envoy provides.

K8s network security best practices

While the challenges described above are quite limiting, there is still much that can be done.

A K8s network security solution should follow these guidelines:

●      Enforce “Zero trust” – Each microservice acts as its own permitter, as such it is recommended to follow the zero-trust model: do not trust and always verify! Where each request is being authenticated and authorized before the access is approved.

●      Upgrade to Mutual TLS – it is recommended to use mutual TLS in order to encrypt the communication between the different microservices. This will assure that even if an attacker is present on the host, he cannot intercept and decode the traffic.

●      Provide Network visibility – you cannot really protect something that you cannot see. Visibility is the key for understanding the communication patterns, not only of what is working, but also of what is not working, get dropped etc.

●      Apply robust policy to meet rapid changes – When it comes to policy language., use a language that handles the constant changes in microservices. In most cases the change will be inside of the cluster. Meaning that once you set the ingress/egress traffic from/to the cluster, most of the changes will happen in the communication between the microservices.

Introducing ARMO

ARMO is a K8s security fabric that seamlessly infuses security, visibility and control into any workload -from build to runtime.

ARMO developed an innovative approach to network security that eliminates the need to chase after constant policy adjustments created by the deployment of new pods. It solves the identity problem described above by creating a mutual TLS between pods and providing a network visibility graph.

How do we do this?

–       ArmoGuard is assigned to the allowed processes in a pod

–       Each process gets a unique code-DNA

–       Based on its code-DNA, ArmoGuardTM distributes and manages TLS certificates to each pod

How is the policy created?

ARMO provides an out-of-the-box Zero Trust Policy, we call it “baseline” policy.

A Zero Trust policy is the perfect solution for configuration-free environments. ARMO enables all the non-compromised workloads to communicate freely as long as they have the same label or attribute defined in the policy. In order to achieve that, the baseline policy is set to a permissive mode, meaning that the ArmoGuardTM accepts all the clear, unidentified connections. Once all the workloads are assigned with ArmoGuard and the communication patterns are known, you can move to an Explicit policy and remove the permissive option, thus allowing only identified workloads to communicate.

This means that if an attacker tries to run an unapproved process in a pod and communicate with other pods, it gets blocked.

By default, all the communication channels are converted to mutual TLS without needing to define a Certificate Authority (CA) and assign keys to each pod. Using ARMO’s patented technology private keys are protected in-memory.

In addition, ARMO can create a communication channel between hybrid cloud environments. It enables you to place ArmoGuard on pods or workloads in different cloud environments and allow them to communicate using the identity and over mutual TLS.