Linux kernel root vulnerability and systemd denial of service

Security researchers at Qualys have revealed the details of two vulnerabilities affecting the Linux kernel and the systemd system manager. A vulnerability in the kernel (CVE-2021-33909) allows a local user to achieve root code execution through manipulation of heavily nested directories.

The danger of the vulnerability is aggravated by the fact that the researchers were able to prepare working exploits that work in Ubuntu 20.04/20.10/21.04, Debian 11 and Fedora 34 in the default configuration. It is noted that other distributions have not been tested, but are theoretically also susceptible to the problem and can be attacked. They promise to publish the full code of the exploits after the widespread elimination of the problem, but for now only a prototype that is limited in functionality is available, causing the system to crash. The problem has been present since July 2014 and affects kernel releases from 3.16. The vulnerability fix was coordinated with the community and accepted into the core on July 19th. Major distributions have already generated kernel package updates (Debian, Ubuntu, Fedora, RHEL, SUSE, Arch).

The vulnerability is caused by the lack of checking the result of converting size_t to int before performing operations in the seq_file code that creates files from a sequence of records. Lack of validation can result in out-of-buffer writes when creating, mounting, and deleting a very nested directory structure (path size greater than 1 GB). As a result, an attacker can achieve writing a 10-byte string "//deleted" with an offset "-2 GB - 10 bytes" pointing to the area immediately preceding the allocated buffer.

The prepared exploit requires 5 GB of memory and 1 million free inodes to work. The exploit's job is to create, via the mkdir() call, a hierarchy of about a million nested directories in order to achieve a file path size greater than 1 GB. This directory is mounted via bind-mount in a separate user namespace, after which the rmdir() function is run to remove it. In parallel, a thread is created that loads a small eBPF program, which blocks at a stage after checking the eBPF pseudocode, but before JIT compiling it.

In an unprivileged user ID namespace, the file /proc/self/mountinfo is opened and the long path of the bind-mounted directory is read, which causes the line "//deleted" to be written to the area before the beginning of the buffer. The position to write the line is chosen in such a way that it overwrites the instruction in the already tested but not yet compiled eBPF program.

Further, at the level of the eBPF program, uncontrolled out-of-buffer writing is transformed into a controlled ability to read and write to other kernel structures through manipulation with the btf and map_push_elem structures. As a result, the exploit determines the location of the modprobe_path[] buffer in the kernel memory and overwrites the path β€œ/sbin/modprobe” in it, which allows you to initiate the launch of any executable file with root rights in case of a request_module() call, which is executed, for example, when creating netlink socket.

Researchers provide several workarounds that are effective only for a specific exploit, but do not eliminate the problem itself. It is recommended to set "/proc/sys/kernel/unprivileged_userns_clone" to 0 to disable mounting directories in a separate user ID namespace, and "/proc/sys/kernel/unprivileged_bpf_disabled" to 1 to disable loading eBPF programs into the kernel.

It is noteworthy that while analyzing an alternative attack involving the use of the FUSE mechanism instead of bind-mound to mount a large directory, the researchers came across another vulnerability (CVE-2021-33910) affecting the systemd system manager. It turned out that when trying to mount via FUSE a directory with a path size exceeding 8 MB, the initialization control process (PID1) runs out of stack memory and crashes, which brings the system into a β€œpanic” state.

The problem is that systemd keeps track of and parses the contents of /proc/self/mountinfo, and handles each mount point in the unit_name_path_escape() function, which runs the strdupa() operation, which places the data on the stack instead of in dynamically allocated memory. Since the maximum stack size is limited via RLIMIT_STACK, processing too large a path to the mount point causes the PID1 process to crash and the system to stop. For the attack, you can use the simplest FUSE module in combination with the use of a highly nested directory as a mount point, the path size of which exceeds 8 MB.

The problem appears since systemd 220 (April 2015), has already been fixed in the main systemd repository and fixed in distributions (Debian, Ubuntu, Fedora, RHEL, SUSE, Arch). Notably, in systemd release 248, the exploit fails due to a bug in systemd code that causes /proc/self/mountinfo to fail. It is also interesting that in 2018 a similar situation arose and when trying to write an exploit for the CVE-2018-14634 vulnerability in the Linux kernel, Qualys researchers came across three critical vulnerabilities in systemd.

Source: opennet.ru

Add a comment