What is the difference between a root process and a containerized root process?

May 29, 2024

Ben Hirschberg
CTO & Co-founder

To answer this question, let’s first look at some history.

Processes are software instances running in their own memory spaces. They enable a user to execute multiple software instances in parallel on the same computer. The concepts are derived from operating systems of the 1960s, with UNIX first being released in 1971.

In today’s operating systems, every process is associated with an identity to which authorizations are bound. This enables the definition of access controls around processes.

Linux, which runs the majority of containerized workloads, associates a user ID (UID) with every process. The kernel authorizes a process to access files and resources based on this ID. UID 0 (“root”) has full administrative privileges and can perform any system operation, while other users have varying levels of privileges depending on their assigned ID.

Docker’s initial architecture had the “runc” process (which is the origin of all containerized processes) run as root, and so all containers by default were created with root access. This default has been inherited by other container runtimes (including containerd and cri-o). The idea behind this that “runc” can change user to an unprivileged user if it wants

Even if someone overrides this setting, there is a good chance that the container won’t work since most container images assume that the containerized processes have root privileges due to file system privilege settings. The container image filesystem is built for root because “docker build” was designed to run as root, so the user has to work hard to build a container image with a filesystem that can be accessed by an unprivileged process.

Returning to our question: what is the difference between a process running as root (UID 0) and a containerized process running as root.

The answer is not simple. First of all, because processes and UIDs are constructs of the kernel therefore they have a very clear definition. Containers are solely a user-space convenience, not a kernel construct. Containers, as defined by OCI, are the combination of an image-format and a runtime that makes use of kernel features. The two most famous features are cgroups, which are used for resource accounting and limiting, and Linux namespaces, which provide most of the isolation features.

A process, even if it is a root process, can be isolated with Linux namespace from other resources in a container. For example:

Process namespace isolation: a containerized root process will not see and will not be able to address other processes on the same machine
Network namespace isolation: a containerized root process will be assigned its own network stack regardless of the actual network stack of the machine
Filesystem or mount namespace isolation: a containerized root process will see a subset of the host filesystem (unless configured otherwise). It is a well known fact that in Linux/Unix everything is a “file”, meaning that user-space accesses devices through the file system. This means that filesystem isolation enables device isolation as well.
IPC namespace isolation: a containerized root process will have its own set of inter process communication facilities which are isolated from the other processes outside the namespace

This all means that a containerized root process can see only a limited part of the host machine. How much depends on the actual container configuration, but the default configuration is nearly like seeing its own virtual machine — a containerized root process, in the usual case, is very limited.

This is not the end, however.

Linux offers a process-level setting called “capabilities”. Capabilities are flags that enable a process to have access to different kernel resources and features. This enables more fine-grained access control than only using the UID. By default, container runtimes drop all these capabilities (except a few harmless capabilities that are not associated with administrative permissions), even in the case of running the container processes as root (UID 0). This is yet another way the root user is when running in a container.

You may have heard that it is a bad security practice to run containers with root processes. Given all the ways that If it is indeed as depleted as explained above, why do we care?

There are multiple reasons.

Except for a very few containerized applications which are essentially made to access the container host (host manager, anti-virus, etc.) most of the time applications do not have a reason to root owner resources on the host system. If there is some level of overlap between the host and container resources, be it file system, IPC, network, or processes, it is not a good idea to let containerized applications access root resources.

In the case of classical containers which are completely separated from the host, running a container is still a bad practice. Linux kernel had and is thought to have vulnerabilities that enable an attacker who runs code in a containerized application to escape the confinements of the Linux kernel. Based on previous examples, many of these vulnerabilities could be exploited by a containerized process when it was running as a root, but not when it ran as a non-root!

Therefore running containerized root processes is something that raises the security risk.