Vulnerability in cgroups v1 allowing exit from isolated container

Details of a vulnerability (CVE-2022-0492) in the implementation of the cgroups v1 resource limit mechanism in the Linux kernel, which can be used to exit isolated containers, have been disclosed. The problem has been present since Linux kernel 2.6.24 and has been fixed in kernel releases 5.16.12, 5.15.26, 5.10.97, 5.4.177, 4.19.229, 4.14.266, and 4.9.301. You can follow the publication of package updates in distributions on these pages: Debian, SUSE, Ubuntu, RHEL, Fedora, Gentoo, Arch Linux.

The vulnerability is caused by a logical error in the release_agent file handler, due to which proper checks were not performed when the handler was run with full permissions. The release_agent file is used to define the program that the kernel executes when a process in a cgroup terminates. This program is run as root with all "capabilities" in the root namespace. It was assumed that only the administrator had access to the release_agent setting, but in reality the checks were limited to granting access to the root user, which did not exclude changing the setting from the container or by the non-administrative root user (CAP_SYS_ADMIN).

Previously, this feature would not have been perceived as a vulnerability, but the situation has changed with the advent of user identifier namespaces (user namespaces), which allow you to create separate root users in containers that do not overlap with the root user of the main environment. Accordingly, for an attack, it is enough in a container that has its own root user in a separate user ID space to connect its release_agent handler, which, after the process is completed, will be executed with full privileges of the main environment.

By default, cgroupfs is mounted in a container in read-only mode, but there is no problem remounting this pseudofs in write mode with CAP_SYS_ADMIN rights or by creating a nested container with a separate user namespace using the unshare system call, in which CAP_SYS_ADMIN rights are available for the created container.

Vulnerability in cgroups v1 allowing exit from isolated container

The attack can be carried out by having root permissions in an isolated container, or by running the container without the no_new_privs flag, which prohibits obtaining additional privileges. The system must have support for user namespaces enabled (enabled by default on Ubuntu and Fedora, but not enabled on Debian and RHEL) and have access to the root cgroup v1 (for example, Docker runs containers in the root RDMA cgroup). The attack is also possible with CAP_SYS_ADMIN privileges, in which case support for user namespaces and access to the cgroup v1 root hierarchy is not required.

In addition to exiting the isolated container, the vulnerability also allows processes started by the root user without "capabilities" or any user with CAP_DAC_OVERRIDE privileges (the attack requires access to the /sys/fs/cgroup/*/release_agent file owned by root) to gain access to all system "capabilities".

It is noted that the vulnerability cannot be exploited when using Seccomp, AppArmor or SELinux protection mechanisms for additional isolation of containers, since Seccomp blocks the call to the unshare () system call, and AppArmor and SELinux do not allow cgroupfs to be mounted in write mode.

Source: opennet.ru

Add a comment