Linux 6.2 kernel release

After two months of development, Linus Torvalds has released the Linux 6.2 kernel. Notable changes include: Copyleft-Next-licensed code is allowed, RAID5/6 implementation in Btrfs has been improved, Rust language support has continued to be integrated, Retbleed protection overhead has been reduced, write-back memory throttling has been added, and a TCP mechanism has been added. PLB (Protective Load Balancing) balancing, a hybrid mechanism for protecting the flow of command execution (FineIBT) has been added, BPF has the ability to define its own objects and data structures, the rv (Runtime Verification) utility is included in the composition, power consumption in the implementation of RCU locks has been reduced.

The new version received 16843 fixes from 2178 developers, the patch size is 62 MB (the changes affected 14108 files, 730195 lines of code were added, 409485 lines were deleted). About 42% of all changes introduced in 6.2 are related to device drivers, approximately 16% of changes are related to updating code specific to hardware architectures, 12% are related to the networking stack, 4% to file systems, and 3% to internal kernel subsystems.

Key innovations in kernel 6.2:

  • Memory and system services
    • It is allowed to include in the core code and changes supplied under the Copyleft-Next 0.3.1 license. The Copyleft-Next license was created by one of the GPLv3 contributors and is fully compatible with the GPLv2 license, as verified by lawyers from SUSE and Red Hat. Compared to GPLv2, the Copyleft-Next license is much more compact and easier to understand (removed the introductory part and the mention of obsolete compromises), determines the timing and procedure for eliminating violations, automatically removes copyleft requirements for the obsolete when, which is more than 15 years old.

      Copyleft-Next also contains a proprietary technology grant clause, which, unlike GPLv2, makes this license compatible with the Apache 2.0 license. To ensure full GPLv2 compatibility, Copyleft-Next explicitly states that a derivative work may be distributed under the GPL as well as the original Copyleft-Next license.

    • The "rv" utility is included, which provides an interface for interacting from user space with the handlers of the RV (Runtime Verification) subsystem, designed to check the correct operation on highly reliable systems that guarantee the absence of failures. Validation is done at run time by attaching handlers to tracepoints that check the actual progress of execution against a predetermined reference deterministic automaton model that defines the expected behavior of the system.
    • The zRAM device, which allows to store the swap partition in memory in a compressed form (a block device is created in memory on which swapping is performed with compression), the possibility of repacking pages using an alternative algorithm to achieve a higher level of compression is implemented. The main idea is to provide a choice between several algorithms (lzo, lzo-rle, lz4, lz4hc, zstd), which offer their trade-offs between compression / decompression speed and compression level, or optimal in special situations (for example, for compressing large memory pages).
    • The "iommufd" API has been added to manage the I/O Memory-Management Unit (I/O Memory-Management Unit) from user space. The new API makes it possible to manage I/O memory page tables using file descriptors.
    • BPF provides the ability to create types, define your own objects, build your own object hierarchy, and flexibly form your own data structures, such as linked lists. Support for bpf_rcu_read_{,un}lock() locks has been added for sleeping BPF programs (BPF_F_SLEEPABLE). Implemented support for saving task_struct objects. Added map type BPF_MAP_TYPE_CGRP_STORAGE to provide local storage for cgroups.
    • For the RCU blocking mechanism (Read-copy-update), an optional mechanism of "lazy" callback calls is implemented, in which several callback calls are processed at once in batch mode by a timer. The application of the proposed optimization allows to reduce power consumption on Android and ChromeOS devices by 5-10% by postponing RCU requests during idle or low system load.
    • Added sysctl split_lock_mitigate to control how the system reacts when it detects "split locks" that occur when accessing unaligned data in memory due to data crossing two CPU cache lines when executing an atomic instruction. Such locks lead to a significant drop in performance. Setting split_lock_mitigate to 0 only warns that there is a problem, while setting it to 1, in addition to issuing a warning, also slows down the execution of the process that caused the lock to preserve the performance of the rest of the system.
    • For the PowerPC architecture, a new implementation of qspinlock locks has been proposed that demonstrates higher performance and solves some locking problems that occur in exceptional cases.
    • The MSI (Message-Signaled Interrupts) interrupt handling code has been redesigned, which eliminates the accumulated architectural problems and adds support for binding individual handlers to different devices.
    • For systems based on the LoongArch instruction set architecture used in the Loongson 3 5000 processors and implementing a new RISC ISA similar to MIPS and RISC-V, support for ftrace, stack protection, sleep and standby has been implemented.
    • The ability to assign names to areas of shared anonymous memory has been provided (previously, names could only be assigned to private anonymous memory assigned to a specific process).
    • A new kernel command-line option "trace_trigger" has been added to activate a trace trigger used to bind conditional commands that are called when a checkout is triggered (for example, trace_trigger="sched_switch.stacktrace if prev_state == 2").
    • Increased version requirements for the binutils package. Kernel builds now require at least binutils 2.25.
    • When calling exec(), added the ability to place the process in a time namespace, in which the time differs from the system time.
    • A port of additional functionality has begun from the Rust-for-Linux branch related to using Rust as a second language for developing drivers and kernel modules. Rust support is disabled by default and does not cause Rust to be included as a required kernel build dependency. The basic functionality offered in the last release is extended with support for low-level code, such as the Vec type and the pr_debug!(), pr_cont!() and pr_alert!() macros, as well as the "#[vtable]" procedural macro, which simplifies the work with pointer tables on functions. The addition of high-level Rust wrappers over kernel subsystems, which will allow you to create full-fledged drivers in Rust, is expected in future releases.
    • The "char" type used in the kernel is now declared unsigned by default for all architectures.
    • Deprecated slab memory allocation mechanism - SLOB (slab allocator), which was designed for a system with a small amount of memory. It is recommended to use SLUB or SLAB instead of SLOB under normal circumstances. For systems with a small amount of memory, it is recommended to use SLUB in SLUB_TINY mode.
  • Disk Subsystem, I/O and File Systems
    • Improvements have been made to Btrfs aimed at fixing the "write hole" problem in the RAID 5/6 implementation (attempting to rebuild a RAID if a crash occurred during a write and it is impossible to understand which block on which of the RAID devices was written correctly, which can lead to block corruption, corresponding to underwritten blocks). In addition, the default asynchronous execution of the "discard" operation is now automatically enabled for SSDs when possible, allowing for better performance due to the efficient grouping of "discard" operations in a queue and processing the queue by a background handler. Improved performance of send and lseek operations, as well as FIEMAP ioctl.
    • Extended options for managing delayed writes (writeback, background saving of changed data) for block devices. In some situations, for example, when using network block devices or USB drives, delayed writes can lead to a large consumption of RAM. New parameters strict_limit, min_bytes, max_bytes, min_ratio_fine and max_ratio_fine have been introduced in sysfs (/sys/class/bdi/) to control the behavior of lazy writes and keep the page cache size within certain limits.
    • The F2FS file system implements the atomic replace ioctl operation, which allows you to write data to a file as part of a single atomic operation. F2FS also adds a block extent cache to help identify data that is actively used or has not been accessed in a while.
    • In FS ext4, only bug fixes are noted.
    • The ntfs3 file system offers several new mount options: "nocase" to control case-sensitive characters in file and directory names; windows_name to prevent the creation of file names containing characters that are invalid for Windows; hide_dot_files to control how the hidden file label is assigned to files that start with a dot.
    • The Squashfs filesystem implements the "threads=" mount option, with which you can specify the number of threads to parallelize decompression operations. Squashfs also introduced the ability to map user IDs of mounted filesystems, used to map files from a specific user on a mounted foreign partition to another user on the current system.
    • Redesigned implementation of POSIX Access Control Lists (POSIX ACLs). The new implementation fixes architectural issues, makes the codebase easier to maintain, and uses safer data types.
    • Support for the SM4 encryption algorithm (Chinese standard GB/T 32907-2016) has been added to the fscrypt subsystem, which is used to transparently encrypt files and directories.
    • Provided the ability to build a kernel without NFSv2 support (in the future, NFSv2 support is planned to be completely removed).
    • The organization of checking access rights to NVMe devices has been changed. Granted the ability to read and write to an NVMe device if the writing process has access to the device's special file (previously the process had to have the CAP_SYS_ADMIN permission).
    • Removed packaged CD/DVD driver that was deprecated in 2016.
  • Virtualization and Security
    • Implemented a new method to protect against the Retbleed vulnerability in Intel and AMD CPUs using call depth tracking, which is not as slow as the previously present Retbleed protection. To enable the new mode, the kernel command line parameter "retbleed=stuff" is proposed.
    • A hybrid FineIBT command flow protection mechanism has been added, combining the use of Intel IBT (Indirect Branch Tracking) hardware instructions and kCFI (kernel Control Flow Integrity) software protection to block violations of the normal execution order (control flow) as a result of using exploits that change pointers stored in memory on functions. FineIBT allows execution on an indirect branch only in the case of a jump to the ENDBR instruction, which is placed at the very beginning of the function. Additionally, by analogy with the kCFI mechanism, the hashes are checked next, guaranteeing the immutability of pointers.
    • Restrictions have been added to block attacks that manipulate the generation of "oops" states, after which the problematic tasks are completed and the state is restored without stopping the system. With a very large number of calls to the "oops" state, a reference counter (refcount) overflow occurs, which allows exploiting vulnerabilities caused by dereferencing NULL pointers. To protect against such attacks, the kernel has added a limit on the maximum number of "oops" operations, after exceeding which the kernel will initiate a transition to the "panic" state, followed by a reboot, which will not allow reaching the number of iterations required to overflow the refcount. By default, the limit is set to 10 thousand "oops", but if desired, it can be changed through the oops_limit parameter.
    • Added configuration parameter LEGACY_TIOCSTI and sysctl legacy_tiocsti to disable the ability to put data into the terminal using ioctl TIOCSTI, since this functionality can be used to substitute arbitrary characters in the terminal's input buffer and simulate user input.
    • A new type of internal structures encoded_page is proposed, in which the lower bits of the pointer are used to store additional information used to protect against accidental dereferencing of the pointer (if a dereference is really needed, these extra bits must first be cleared).
    • On the ARM64 platform, at the boot stage, it is possible to enable and disable the software implementation of the Shadow Stack mechanism, which is used to protect against overwriting the return address from a function in case of a buffer overflow on the stack (the essence of protection is to save the return address in a separate "shadow" stack after control is transferred to the function retrieving the given address before exiting the function). Support in the same kernel assembly for the hardware and software implementation of the Shadow Stack allows you to use the same core on different ARM systems, regardless of whether they support instructions for pointer authentication. The inclusion of software implementation is carried out through the substitution during loading of the necessary instructions in the code.
    • Added support for using the asynchronous exit notification mechanism on Intel processors, which allows detecting single-step attacks on code running in SGX enclaves.
    • A set of operations is proposed that allow the hypervisor to support requests from Intel TDX (Trusted Domain Extensions) guest systems.
    • Removed RANDOM_TRUST_BOOTLOADER and RANDOM_TRUST_CPU kernel build settings, instead use the appropriate random.trust_bootloader and random.trust_cpu command line options.
    • Support for the LANDLOCK_ACCESS_FS_TRUNCATE flag has been added to the Landlock mechanism, which allows you to limit the interaction of a group of processes with the external environment, which makes it possible to control the execution of file truncation operations.
  • Network subsystem
    • For IPv6, support has been added for PLB (Protective Load Balancing), a load balancing mechanism between network links, aimed at reducing congestion points on data center switches. By changing the IPv6 Flow Label, the PLB randomly changes packet paths to balance the load on the switch ports. To reduce packet reordering, this operation is performed after idle periods whenever possible. The use of PLB in Google data centers has reduced the load imbalance on switch ports by an average of 60%, reduced packet loss by 33%, and reduced latency by 20%.
    • Added driver for MediaTek Wi-Fi 7 (802.11be) devices.
    • Added support for 800-gigabit links.
    • Added the ability to rename network interfaces on the fly, without stopping work.
    • Added a mention of the IP address to which the packet arrived to the SYN flood messages written to the log.
    • For UDP, the ability to use separate hash tables for different network namespaces is implemented.
    • Network bridges support the MAB (MAC Authentication Bypass) authentication method.
    • For the CAN (CAN_RAW) protocol, SO_MARK socket mode is supported for attaching traffic filters based on fwmark.
    • ipset has a new bitmask parameter that allows you to set a mask based on arbitrary bits in an IP address (for example, "ipset create set1 hash:ip bitmask 255.128.255.0").
    • Added support to nf_tables for processing inner headers inside tunneled packets.
  • Equipment
    • The "accel" subsystem has been added with the implementation of the framework for computing accelerators, which can be supplied both in the form of separate ASICs and in the form of IP blocks inside the SoC and GPU. These accelerators are mainly focused on speeding up the solution of machine learning problems.
    • The amdgpu driver includes support for GC, PSP, SMU, and NBIO IP components. DCN (Display Core Next) support is implemented for ARM64 systems. The implementation of secure screen output has been moved from using the DCN10 to the DCN21 and can now be used when multiple screens are connected.
    • The i915 (Intel) driver stabilized support for Intel Arc (DG2/Alchemist) discrete graphics cards.
    • The Nouveau driver supports the NVIDIA GA102 (RTX 30) GPU based on the Ampere architecture. For nva3 (GT215) cards, the ability to control the backlight has been added.
    • Added support for wireless adapters based on Realtek 8852BE, Realtek 8821CU, 8822BU, 8822CU, 8723DU (USB) and MediaTek MT7996 chips, Broadcom BCM4377/4378/4387 Bluetooth interfaces, and Motorcomm yt8521, NVIDIA Tegra GE Ethernet controllers.
    • Added ASoC (ALSA System on Chip) support for embedded sound chips HP Stream 8, Advantech MICA-071, Dell SKU 0C11, Intel ALC5682I-VD, Xiaomi Redmi Book Pro 14 2022, i.MX93, Armada 38x, RK3588. Added support for Focusrite Saffire Pro 40 audio interface. Added Realtek RT1318 audio codec.
    • Added support for Sony smartphones and tablets (Xperia 10 IV, 5 IV, X and X compact, OnePlus One, 3, 3T and Nord N100, Xiaomi Poco F1 and Mi6, Huawei Watch, Google Pixel 3a, Samsung Galaxy Tab 4 10.1.
    • Added support for ARM SoC and Apple boards T6000 (M1 Pro), T6001 (M1 Max), T6002 (M1 Ultra), Qualcomm MSM8996 Pro (Snapdragon 821), SM6115 (Snapdragon 662), SM4250 (Snapdragon 460), SM6375 (Snapdragon 695) , SDM670 (Snapdragon 670), MSM8976 (Snapdragon 652), MSM8956 (Snapdragon 650), RK3326 Odroid-Go/rg351, Zyxel NSA310S, InnoComm i.MX8MM, Odroid Go Ultra.

At the same time, the Latin American Free Software Foundation formed a variant of the completely free kernel 6.2 - Linux-libre 6.2-gnu, cleared of firmware and driver elements containing non-free components or code sections, the scope of which is limited by the manufacturer. In the new release, the cleaning of new blobs in the nouveau driver has been carried out. Disabled blobs loading in mt7622, ​​mt7996 wifi and bcm4377 bluetooth drivers. Cleaned blob names in dts-files for Aarch64 architecture. Updated blob cleanup code in various drivers and subsystems. Stopped cleaning the s5k4ecgx driver as it was removed from the kernel.

Source: opennet.ru

Add a comment