Kubernetes posture management – no time to waste time

Dec 27, 2021

Amir Kaushansky
VP Product

Five months ago, we decided to release a posture management solution for K8s and make it open source for everyone to enjoy it. Today, I can proudly say that it is more successful than I ever expected it to be when we launched it – we have hundreds of registered users, thousands of runs each day, and almost 5K git stars, and it’s growing every day

Kubescape, which started as a tool that scans K8s clusters, YAML files, and HELM Charts for misconfiguration, now offers more capabilities: It scans worker nodes and control nodes (API server, managed and unmanaged). It offers image scanning and RBAC visualization.

The feedback I get from users is AMAZING and we keep developing and improving it based on end-users requirements!

Still, there is one feedback I get that made me write this blog; many users claim that they don’t have time to work with Kubescape outputs. In most organizations, DevOps teams have such a big backlog that they run Kubescape to help them find security-related issues and misconfiguration, but they are overwhelmed with the number of issues that they were not aware of. This is the reason that brought me to share what we do today and what we will do in the future to help DevOps get another member in your DevOps team – Kubescape.

The typical Kubescape flow

The way I see users usually using Kubescape is as follows:

Scan K8s cluster/YAML
Review the results (set exceptions, fix issues) – set a baseline
Rescan
Look at drifts
#goto step 3 every day/week/when something changes

The real challenge is step #2 above – how to set the baseline and make sure that these issues do not occur again. In the next section, I am going to share the capabilities we developed to help you with this task.

Failed resources

Upon scanning the cluster for risk (the exact chosen framework does not matter). You get a detailed report with the name of the control, risk number, the status of the control (failed or passed), remediation instructions, and the number of resources that failed. You can see from the failed resource count if the number of failed resources did not change from the scan you did before, increased or decreased. This provides a very powerful capability to quickly understand what changed since the last scan.

Kubescape's Kubernetes risk scan report — Figure 1: Kubescape’s Kubernetes risk scan report

Once you click on the failed resources number, a pop-up appears that shows the detailed list of the resources that failed.

In this view we provide you with the following capabilities:

Detailed failed resources and previously failed resources – this will allow you to easily understand which of the failed resources were deployed or drifted since the last scan.
Exceptions – you can mark a resource or even the entire namespace (with all the resources included in it) to be excluded from this control. Kubescape will NOT count these resources as failed in the next scan, but it will still show it in the failed resources list, but as excluded. For example: if kube-proxy should be privileged by design and you decide to mark it as excluded, in the next scan the failed resource count will decrease by 1.
Once you fix all the misconfigurations and set exceptions, Kubescape will mark the control as excluded if the resources that failed are all in the exclusion list or passed if you fixed all the issues and there are no exclusions.
Recommendations – Kubescape uses the power of crowdsourcing to recommend users on the resources that should be excluded. Kubescape looks at thousands of scans results and identifies what is the probability of a resource to fail on a given control. If the probability is high – it will recommend adding it as an exception. For example: if You have an NGINX ingress controller, and the controller fails in the “host path mount” control. Kubescape has seen that this is common for 90% of the users and will recommend you to mark it as an exception.
Assistance remediation –
1. Once you click on the icon next to the resource name, a new tab appears with the resource definition and the reason Kubescape failed this resource.

Kubescape's Kubernetes failed resource view — Figure 2: Kubescape’s Kubernetes failed resource view

The new tab shows the resource that failed, the control in which it failed and the line number in the resource definition that caused it to fail.

You can copy the object or download it as a file, fix it and apply it to your cluster.

Kubescape's Kubernetes assistant remediation — Figure 3: Kubescape’s Kubernetes assistant remediation

Control view and a resource view –
- In the control view, Kubescape is focused on the controls that were run as part of the framework, this means that the result will show the failed resources.
- In resource view, Kubescape shows the resources in your cluster. You can look at each resource and see the list of controls that failed.

Kubescape's Different report view options — Figure 4: Kubescape’s Different report view options

Kubescape's Kubernetes resource view scan results — Figure 5: Kubescape’s Kubernetes resource view scan results

Using the above capabilities, you should be able to set the threshold of your cluster easily and from that point on, just focus on changes.

You can monitor these changes in the risk score change (the higher it is, the riskier it is) and the graph.

Figure 6: Kubescape’s Kubernetes risk graph and score

Still, we thought that we can help our users a lot more.

So what’s coming next….

Kubescape is going to support additional capabilities to help you accelerate your work.

Policy
Contextual priority

In the next versions, Kubescape is going to give you an option to define per control if you want it to be: monitored, enforced, or remediated. This setting will be per cluster and/or namespace.

Monitor – will act as the risk scanning you have today, it will issue a report with the failed resources and the flows you have above.
Enforce – will make this control become a guardrail which means that no one will be able to deploy the resource unless you add it to the exception list.
Remediate – will fix the failed resource for you automatically and make sure you deploy the resource with the right settings.

The last feature we are going to develop is Contextual Priority. We know that not all alerts are created equally. While some have a critical severity, it might be that network connectivity is needed to be able to exploit it. But your application is not connected to any network resource hence you can postpone/deprioritize it. Understanding it will help to point your efforts to the places where your organization benefits the most.

To Summarize

We understand that running a posture product might be scary as you hardly handle all the tasks you have today and don’t want to be buried with additional tasks. Not knowing is sometimes better, but it comes with a risk (that Kubescape scores).

We designed Kubescape to help you by:

1) Set a healthy baseline: