Linux 5.18 kernel release

After two months of development, Linus Torvalds has unveiled the release of the Linux 5.18 kernel. Among the most noticeable changes: a large cleaning of obsolete functionality was carried out, the Reiserfs file system was declared obsolete, user process trace events were implemented, support for the Intel IBT exploit blocking mechanism was added, buffer overflow detection mode was enabled when using the memcpy () function, a mechanism for tracking fprobe function calls was added, the performance of the task scheduler on the AMD Zen CPU has been improved, the driver for controlling the functionality of the Intel CPU (SDS) has been included, some patches have been integrated for restructuring header files, the use of the C11 standard has been approved.

The new version accepted 16206 fixes from 2127 developers (the last release had 14203 fixes from 1995 developers), the patch size is 108 MB (the changes affected 14235 files, 1340982 lines of code were added, 593836 lines were deleted). About 44% of all changes introduced in 5.18 are related to device drivers, about 16% of changes are related to updating code specific to hardware architectures, 11% are related to the networking stack, 3% to file systems, and 3% to internal kernel subsystems.

Key innovations in kernel 5.18:

  • Disk Subsystem, I/O and File Systems
    • The Btrfs file system has added support for forwarding compressed data when performing send and receive operations. Previously, when using send/receive, the sending side decompressed the data stored in compressed form, and the receiving side recompressed before writing. In the 5.18 kernel, user-space applications using send/receive calls are given the ability to transfer compressed data without repackaging. The functionality is implemented thanks to the new ioctl operations BTRFS_IOC_ENCODED_READ and BTRFS_IOC_ENCODED_WRITE, which allow you to directly read and write information to extents.

      In addition, fsync performance has been improved in Btrfs. Added the ability to dedupe and reflink (cloning file metadata with creating a link to existing data without actually copying it) for the entire storage, not limited to mount points.

    • In the Direct I/O mode, it is possible to access encrypted files when fscrypt uses inline encryption, in which encryption and decryption operations are performed by the drive's controller, not the kernel. With conventional kernel encryption, access to encrypted files using Direct I / O is still impossible, since files are accessed bypassing the buffering mechanism in the kernel.
    • The NFS server has NFSv3 enabled by default, which now does not require a separate enablement and is available when NFS is enabled generally. NFSv3 is considered the primary and always supported version of NFS, and support for NFSv2 may be dropped in the future. Significantly improved the efficiency of reading the contents of directories.
    • The ReiserFS file system has been deprecated and is expected to be removed in 2025. Deprecating ReiserFS will reduce the effort required to maintain filesystem-wide changes to support the new mount, iomap, and tome APIs.
    • For the F2FS file system, the possibility of mapping user IDs of mounted file systems is implemented, which is used to match files of a certain user on a mounted foreign partition with another user in the current system.
    • The code for calculating statistics in Device-mapper handlers has been redesigned, which has significantly improved the accuracy of accounting in handlers such as dm-crypt.
    • For NVMe devices, support for 64-bit checksums for integrity checks has been implemented.
    • A new mount option "keep_last_dots" has been proposed for the exfat filesystem, which prevents the dots at the end of the filename from being cleared (on Windows, the dots at the end of the filename are removed by default).
    • EXT4 improves the performance of the fast_commit mode and increases scalability. The mb_optimize_scan mount option, which improves performance in conditions of large file system fragmentation, has been adapted to work with files with extents.
    • Support for write streams has been discontinued in the subsystem that ensures the operation of block devices. This feature was proposed for SSD, but did not become widespread, and now there are no devices supporting this mode in everyday life and it is unlikely that they will appear in the future.
  • Memory and system services
    • The integration of a set of patches has begun, which can significantly reduce the time to rebuild the kernel by restructuring the hierarchy of header files and reducing the number of cross dependencies. The 5.18 kernel includes patches that optimize the structure of the task scheduler header files (kernel/sched). Compared to the last release, the CPU time consumption for building kernel/sched/ code has decreased by 61%, and the actual time has decreased by 3.9% (from 2.95 to 2.84 sec).
    • Kernel code is allowed to use the C11 standard, published in 2011. Previously, the code added to the kernel had to comply with the ANSI C (C89) specification, which was formed back in 1989. Changed the '--std=gnu5.18' option to '--std=gnu89 -Wno-shift-negative-value' in the 11 kernel build scripts. The possibility of using the C17 standard was considered, but in this case it would be necessary to increase the minimum supported version of GCC, while the inclusion of C11 support fits into the current requirements for the GCC version (5.1).
    • Improved task scheduling performance on AMD Zen microarchitecture processors, which provide multiple Last Level Caches (LLCs) per node with local memory channels. The new version eliminates the imbalance of LLC between NUMA nodes, which, under some types of load, led to a significant increase in performance.
    • Extended tools for tracing applications in user space. The new kernel version adds the ability for user processes to create user events and write data to the trace buffer, which can be viewed through common kernel tracing utilities such as ftrace and perf. User-space trace events are isolated from kernel trace events. Event status can be viewed through the /sys/kernel/debug/tracing/user_events_status file, and event logging and data logging through the /sys/kernel/debug/tracing/user_events_data.
    • Added a tracking mechanism (probe) for function calls - fprobe. The fprobe API is based on ftrace, but is limited only by the ability to attach callback handlers to function entry points and terminate the function. Unlike kprobes and kretprobes, the new mechanism allows you to use one handler for several functions at once.
    • Removed support for older ARM processors (ARMv4 and ARMv5) that do not have a memory management unit (MMU). Support for ARMv7-M systems without an MMU has been retained.
    • Support for the RISC-like NDS32 architecture used in Andes Technologies processors has been discontinued. The code was removed due to lack of maintenance and lack of demand for NDS32 support in the main Linux kernel (the remaining users use specialized kernel builds from hardware manufacturers).
    • By default, kernel builds with support for the a.out executable file format are disabled for the alpha and m68k architectures, which continue to use this format. It is likely that support for the deprecated a.out format will soon be completely removed from the kernel. Plans to remove the a.out format have been discussed since 2019.
    • For the PA-RISC architecture, minimal support for the vDSO mechanism (virtual dynamic shared objects) is implemented, which provides a limited set of system calls available in user space without context switching. Support for vDSO made it possible to launch with a non-executable stack.
    • Added support for the Intel HFI (Hardware Feedback Interface) mechanism, which allows hardware to send information about the current performance and energy efficiency of each CPU to the kernel.
    • A driver has been added for the Intel SDSi (Software-Defined Silicon) mechanism, which allows you to control the inclusion of additional features in the processor (for example, specialized instructions and additional cache memory). The idea is that at a lower price, chips with advanced features locked can be supplied, which can then be "buy in addition" and activate additional features without a hardware replacement of the chip.
    • The amd_hsmp driver has been added to support the AMD HSMP (Host System Management Port) interface, which provides access to processor management functions through a set of special registers that have appeared in AMD EPYC server processors since the Fam19h generation. For example, through HSMP, you can get data on power consumption and temperature, set frequency limits, activate various performance enhancement modes, and manage memory settings.
    • The io_uring asynchronous I/O interface implements the IORING_SETUP_SUBMIT_ALL option to register a set of file descriptors in a ring buffer, and the IORING_OP_MSG_RING operation to send a signal from one ring buffer to another ring buffer.
    • The DAMOS (Data Access Monitoring-based Operation Schemes) mechanism, which allows you to free up memory based on the frequency of memory access, has expanded the ability to control memory operations from user space.
    • The third series of patches has been integrated with the implementation of the concept of page folios, which resemble compound pages, but differ in improved semantics and a more understandable organization of work. Using tomes allows you to speed up memory management in some kernel subsystems. In the proposed patches, internal memory management functions have been translated into tomes, including variations of the get_user_pages() function. Provided support for creating large tomes in the data-ahead code.
    • The build system now supports the USERCFLAGS and USERLDFLAGS environment variables, which can be used to pass additional flags to the compiler and linker.
    • In the eBPF subsystem, the BTF (BPF Type Format) engine, which provides information for type checking in BPF pseudocode, implements the ability to add annotations to variables that refer to user-space memory areas. Annotations help the BPF code verification system to better identify and verify memory accesses.
    • A new memory allocation handler for storing loaded BPF programs has been proposed, which allows more efficient use of memory in situations where a large number of BPF programs are loaded.
    • The MADV_DONTNEED_LOCKED flag has been added to the madvise() system call, which provides means for optimizing process memory management. that this block is no longer needed and can be used by the kernel. Unlike MADV_DONTNEED, the use of the MADV_DONTNEED_LOCKED flag is allowed for memory pages pinned in RAM, which, when called on madvise, are evicted without changing their pinned status and, in the case of subsequent access to the block and generation of a β€œpage fault”, are returned with the binding saved. Additionally, a change has been added to allow the use of the MADV_DONTNEED flag with large memory pages in HugeTLB.
  • Virtualization and Security
    • For the x86 architecture, support for the Intel IBT (Indirect Branch Tracking) command flow protection mechanism has been added, which prevents the use of exploit construction techniques using return-oriented programming (ROP, Return-Oriented Programming), in which the exploit is formed in the form of a chain of calls already available in memory of pieces of machine instructions ending with a control return instruction (as a rule, these are the end of functions). The essence of the implemented protection method is to block indirect jumps to the function body by adding a special ENDBR instruction at the beginning of the function and allowing execution by an indirect jump only in the case of a transition to this instruction (an indirect call through JMP and CALL must always fall on the ENDBR instruction, which is located at the very beginning functions).
    • Enabled more stringent buffer bounds checking in the memcpy(), memmove() and memset() functions, which is performed at compile time when the CONFIG_FORTIFY_SOURCE mode is enabled. The added change comes down to checking for out-of-bounds elements of structures whose size is known. It is noted that the implemented feature would block all memcpy()-related buffer overflows in the kernel that have been identified for at least the last three years.
    • The second part of the code for the updated implementation of the pseudo-random number generator RDRAND, which is responsible for the operation of the /dev/random and /dev/urandom devices, has been added. The new implementation is notable for unifying the operation of /dev/random and /dev/urandom, adding protection against the appearance of duplicates in the random number stream when starting virtual machines, and switching to using the BLAKE2s hash function instead of SHA1 for entropy mixing operations. The change made it possible to increase the security of the pseudo-random number generator by getting rid of the problematic SHA1 algorithm and eliminating the overwriting of the RNG initialization vector. Since the BLAKE2s algorithm is ahead of SHA1 in terms of performance, its use also had a positive effect on performance.
    • For the ARM64 architecture, support has been added for a new pointer authentication algorithm - "QARMA3", which is faster than the QARMA algorithm while maintaining the proper level of protection. The technology allows the use of specialized ARM64 instructions to verify return addresses using digital signatures that are stored in the unused upper bits of the pointer itself.
    • For the ARM64 architecture, support for assembly with the inclusion in GCC 12 of the protection mode against overwriting the return address from the function in case of a buffer overflow on the stack is implemented. The essence of protection is to save the return address in a separate β€œshadow” stack after transferring control to the function and extract this address before exiting the function.
    • Added a new keyring (keyring) - "machine", containing the machine owner keys (MOK, Machine Owner Keys) supported in the shim bootloader. These keys can be used to digitally sign post-boot kernel components (eg, kernel modules).
    • Removed support for asymmetric private keys for TPM, which were offered in the legacy version of TPM, have known security issues, and are not widely used in practice.
    • Added protection of data with type size_t from integer overflows. The code uses the size_mul(), size_add(), and size_sub() handlers to safely perform multiplication, addition, and subtraction of sizes with the size_t type.
    • When building the kernel, the "-Warray-bounds" and "-Wzero-length-bounds" flags are enabled, which display warnings when the index goes beyond the array boundary and the use of zero-length arrays.
    • Added support for encryption using the RSA algorithm to the virtio-crypto device.
  • Network subsystem
    • In the implementation of network bridges, support for the port binding mode (locked mode) has been added, in which the user can send traffic through the port only from a permitted MAC address. Also added is the ability to use several structures to evaluate the state of the STP (Spanning Tree Protocol). Previously, only direct binding to STP (1:1) could be performed for VLANs, in which each VLAN was managed independently. The new version adds the mst_enable parameter, when enabled, the state of VLANs is controlled by the MST (Multiple Spanning Trees) module and the binding of VLANs can correspond to the M: N model.
    • Work continued on integration into the network stack of tools for tracking the reasons for dropping packets (reason codes). The reason code is sent during the release of the memory associated with the packet, and allows you to consider situations such as discarding the packet due to errors in filling fields in the header, detection of spoofing by the rp_filter filter, bad checksum, out of memory, triggering of IPSec XFRM rules, bad sequence number TCP etc.
    • The ability to transmit network packets from BPF programs launched from user space in the BPF_PROG_RUN mode is provided, in which BPF programs run in the kernel, but return the result to user space. Packets are transmitted using the XDP (eXpress Data Path) subsystem. A live packet processing mode is supported, in which the XDP handler can forward network packets on the fly to the network stack or to other devices. It is also possible to create software generators of external traffic or substitution of network frames in the network stack.
    • For BPF programs attached to network cgroups, helper functions are proposed for explicitly setting the return value of system calls, which allows you to convey more complete information about the reasons for blocking a system call.
    • The XDP (eXpress Data Path) subsystem has added support for fragmented packets located in several buffers, which allows XDP to process Jumbo frames and apply TSO / GRO (TCP Segmentation Offload / Generic Receive Offload) for XDP_REDIRECT.
    • Significantly accelerated the process of deleting network namespaces, which was in demand on some large systems with a large amount of traffic.
  • Equipment
    • The amdgpu driver has FreeSync adaptive sync technology enabled by default, which allows you to adjust the refresh rate of information on the screen, ensuring smooth and tear-free images when playing games and watching videos. Aldebaran GPU support announced as stable.
    • The i915 driver adds support for Intel Alderlake N chips and Intel DG2-G12 discrete graphics cards (Arc Alchemist).
    • The nouveau driver provides support for higher bit rates for DP/eDP interfaces and support for lttprs cable extensions (Link-Training Tunable PHY Repeaters).
    • In the drm (Direct Rendering Manager) subsystem in the armada, exynos, gma500, hyperv, imx, ingenic, mcde, mediatek, msm, omap, rcar-du, rockchip, sprd, sti, tegra, tilcdc, xen and vc4 drivers, support for the parameter nomodeset , which allows you to disable the switching of video modes at the kernel level and the use of hardware rendering acceleration tools, leaving only the functionality associated with the system framebuffer.
    • Added support for ARM SoΠ‘ Qualcomm Snapdragon 625/632 (used in LG Nexus 5X and Fairphone FP3 smartphones), Samsung Exynos 850, Samsung Exynos 7885 (used in Samsung Galaxy A8), Airoha (Mediatek/EcoNet) EN7523, Mediatek mt6582 (Prestigio PMT5008 tablet) 3G), Microchip Lan966, Renesas RZ/G2LC, RZ/V2L, Tesla FSD, TI K3/AM62 and i.MXRTxxxx.
    • Added support for Broadcom (Raspberry Pi Zero 2 W), Qualcomm (Google Herobrine R1 Chromebook, SHIFT6mq, Samsung Galaxy Book2), Rockchip (Pine64 PineNote, Bananapi-R2-Pro, STM32 Emtrion emSBS, Samsung Galaxy Tab S, Prestigio PMT5008 3G tablet), Allwinner (A20-Marsboard), Amlogic (Amediatek X96-AIR, CYX A95XF3-AIR, Haochuangy H96-Max, Amlogic AQ222 and OSMC Vero 4K+), Aspeed (Quanta S6Q, ASRock ROMED8HM3), Marvell MVEBU/ Armada (Ctera C200 V1 and V2 NAS), Mstar (DongShanPiOne, Miyoo Mini), NXP i.MX (Protonic PRT8MM, emCON-MX8M Mini, Toradex Verdin, Gateworks GW7903).
    • Added support for sound systems and codecs AMD PDM, Atmel PDMC, Awinic AW8738, i.MX TLV320AIC31xx, Intel CS35L41, ESSX8336, Mediatek MT8181, nVidia Tegra234, Qualcomm SC7280, Renesas RZ/V2L, Texas Instruments TAS585M. Added initial sound driver implementation for Intel AVS DSP chip. Updated driver support for Intel ADL and Tegra234, and made changes to improve audio support on Dell, HP, Lenovo, ASUS, Samsung, and Clevo devices.

    At the same time, the Latin American Free Software Foundation formed a variant of the completely free kernel 5.18 - Linux-libre 5.18-gnu, cleared of firmware and driver elements that contain non-free components or code sections, the scope of which is limited by the manufacturer. The new release cleans up drivers for MIPI DBI, Amphion VPU, WiFi MediaTek MT7986 WMAC, Mediatek MT7921U (USB) and Realtek 8852a/8852c panels, Intel AVS and Texas Instruments TAS5805M sound chips. We also cleaned DTS files for various Qualcomm SoCs with processors based on the AArch64 architecture. Updated blob cleanup code in AMD GPU drivers and subsystems, MediaTek MT7915, Silicon Labs WF200+ WiFi, Mellanox Spectru Ethernet, Realtek rtw8852c, Qualcomm Q6V5, Wolfson ADSP, MediaTek HCI UART.

Source: opennet.ru

Add a comment