Race condition in the Linux kernel garbage collector that can lead to privilege escalation

Jann Horn of the Google Project Zero team, who once identified the Specter and Meltdown vulnerabilities, published a technique for exploiting the vulnerability (CVE-2021-4083) in the Linux kernel garbage collector. The vulnerability is caused by a race condition when cleaning unix socket file descriptors and potentially allows a local unprivileged user to have their code executed at the kernel level.

The problem is interesting in that the time window during which the race condition occurs was assessed as too small for creating real exploits, but the author of the study showed that even such initially skeptical vulnerabilities can become a source of real attacks if the exploit creator has the necessary skills and time. Yann Horn showed how, with the help of filigree manipulations, it is possible to reduce the race condition that occurs when calling the close() and fget() functions at the same time to a fully exploited vulnerability of the use-after-free class and achieve access to an already freed data structure inside the kernel.

A race condition occurs during the process of closing a file descriptor while calling the close() and fget() functions at the same time. The call to close() may run before fget() is executed, which will confuse the garbage collector because, according to the refcount, the file structure will not have external references, but it will remain attached to the file descriptor, i.e. the garbage collector will assume that it has exclusive access to the structure, but in fact, for a short period of time, the entry remaining in the file descriptor table will still point to the structure being freed.

To increase the probability of getting into a race condition, several tricks were used that made it possible to increase the probability of exploitation success to 30% when making system-specific optimizations. For example, to increase the access time for a structure with file descriptors by several hundred nanoseconds, data was evicted from the processor cache by polluting the cache with activity on another CPU core, which made it possible to return the structure from memory, and not from the fast CPU cache.

The second important feature was the use of interrupts generated by a hardware timer to increase the race time. The moment was chosen so that the interrupt handler would fire during the occurrence of the race condition and interrupt the execution of the code for a while. To further delay the return of control, epoll generated about 50 thousand entries in the waitqueue, requiring iteration in the interrupt handler.

The vulnerability exploitation technique was disclosed after a 90-day non-disclosure period. The problem has been manifesting since kernel 2.6.32 and was fixed in early December. The fix was included in the 5.16 kernel, and also moved to the LTS branches of the kernel and packages with the kernel supplied in distributions. It is noteworthy that the vulnerability was identified during the analysis of a similar issue CVE-2021-0920, which manifests itself in the garbage collector when processing the MSG_PEEK flag.

Source: opennet.ru

Add a comment