Linux 5.19 kernel release

After two months of development, Linus Torvalds has released the Linux 5.19 kernel. Among the most notable changes: support for the LoongArch processor architecture, integration of "BIG TCP" patches, "on-demand" mode in fscache, removal of code to support the a.out format, the ability to use ZSTD to compress firmware, an interface for managing memory displacement from user space , improved reliability and performance of the pseudo-random number generator, support for Intel IFS (In-Field Scan), AMD SEV-SNP (Secure Nested Paging), Intel TDX (Trusted Domain Extensions) and ARM SME (Scalable Matrix Extension) extensions.

In the announcement, Linus said that the next kernel release is likely to be 6.0, as there are enough releases in the 5.x branch to change the first number in the version number. The renumbering is carried out for aesthetic reasons and is a formal step that relieves discomfort due to the accumulation of a large number of issues in the series.

Linus also mentioned that he used an Apple laptop based on the ARM64 architecture (Apple Silicon) with a Linux environment based on the Asahi Linux distribution to form the release. It's not Linus' primary workstation, but he used the platform to test its suitability for kernel work and to make sure he could build kernel releases on the go with a lightweight laptop handy. Prior to this, many years ago, Linus had experience using Apple hardware for development - he once used a ppc970 CPU-based computer and a Macbook Air laptop.

The new version accepted 16401 fixes from 2190 developers (in the last release there were 16206 fixes from 2127 developers), the patch size is 90 MB (the changes affected 13847 files, 1149456 lines of code were added, 349177 lines were deleted). About 39% of all changes introduced in 5.19 are related to device drivers, about 21% of changes are related to updating code specific to hardware architectures, 11% are related to the networking stack, 4% to file systems, and 3% to internal kernel subsystems.

Key innovations in kernel 5.19:

  • Disk Subsystem, I/O and File Systems
    • The EROFS (Enhanced Read-Only File System) file system, designed for use on read-only partitions, has been switched to using the fscache subsystem, which provides data caching. The change significantly improves the performance of systems in which a large number of containers are launched from an EROFS-based image.
    • An on-demand read mode has been added to the fscache subsystem, which is used to optimize EROFS. The new mode allows organizing caching of reading from FS images located in the local system. Unlike the initially available mode of operation, which is focused on caching data transferred via network file systems in the local file system, the on-demand mode delegates the functions of extracting data and writing them to the cache to a separate background process running in user space.
    • XFS provides the ability to store billions of extended attributes in an i-node. The maximum number of extents for a single file has been increased from 4 billion to 247. A mode has been implemented for atomically updating several file extended attributes at once.
    • The Btrfs file system has been optimized for handling locks, resulting in a performance increase of approximately 7% with direct writes in nowait mode. The performance of operations in NOCOW mode (without copy-on-write) is improved by approximately 3%. Reduced load on the page cache when running the "send" command. The minimum size of subpages has been reduced from 64K to 4K (you can use subpages smaller than kernel pages). The transition from using the basic tree (radix tree) to the XArrays algorithm has been made.
    • A mode has been added to the NFS server to keep the state of a lock placed by a client that has stopped responding. The new mode allows you to delay clearing the lock up to a day, unless another client requests a concurrent lock. In normal mode, the lock is cleared 90 seconds after the client stops responding.
    • In the fanotify FS event tracking subsystem, the FAN_MARK_EVICTABLE flag is implemented, with which you can disable the pinning of target i-nodes in the cache, for example, to ignore highlights without pinning their parts in the cache.
    • Added support for obtaining information about the file creation time through the statx system call with the implementation of a more efficient and functional version of stat() that returns extended information about the file.
    • Significant optimizations have been made to the exFAT driver, related to ensuring simultaneous cleaning of a group of sectors with the active 'dirsync' mode, instead of sequential sector-by-sector cleaning. By reducing the number of block requests after the optimization, the performance of creating a large number of directories on the SD card increased by more than 73-85%, depending on the cluster size.
    • The first corrective update of the ntfs3 driver is included in the kernel. Since the inclusion of ntfs3 in the 5.15 kernel in October last year, the driver has not been updated, and communication with the developers has been lost, but now the developers have resumed publishing changes. The proposed patches fix bugs that cause memory leaks and crashes, resolve issues with running xfstests, clean up unused code, and fix typos.
    • For OverlayFS, the ability to map user IDs of mounted file systems is implemented, which is used to map files of a specific user on a mounted foreign partition with another user in the current system.
  • Memory and system services
    • Added initial support for the LoongArch instruction set architecture used in the Loongson 3 5000 processors, which implements a new RISC ISA similar to MIPS and RISC-V. The LoongArch architecture is available in three flavors: stripped down 32-bit (LA32R), regular 32-bit (LA32S), and 64-bit (LA64).
    • Removed code to support the a.out executable file format, which was deprecated in the 5.1 release. The a.out format has long been retired on Linux systems, and a.out file generation is not supported by modern tools in default Linux configurations. The loader for a.out files can be implemented entirely in user space.
    • Dropped support for x86-specific boot options: nosp, nosmap, nosmep, noexec, and noclflush).
    • Support for the obsolete h8300 CPU architecture (Renesas H8/300), which has long been left unmaintained, has been discontinued.
    • The capabilities related to the response to the detection of split locks ("split lock"), which occur when accessing unaligned data in memory due to the fact that when executing an atomic instruction, the data crosses two lines of the CPU cache, have been expanded. Such locks lead to a significant drop in performance. If earlier by default the kernel issued a warning with information about the process that caused the blocking, now the problem process will be additionally slowed down in order to preserve the performance of the rest of the system.
    • Added support for the IFS (In-Field Scan) mechanism implemented in Intel processors, which allows you to run low-level CPU diagnostic tests that can detect problems that are not detected by regular means based on error correction codes (ECC) or parity bits. The tests that are run are in the form of downloadable firmware, similar to microcode updates. Test results are available through sysfs.
    • Added the ability to embed the bootconfig file into the kernel, which allows, in addition to command line options, to determine the parameters of the kernel through a configuration file. Embedding is done using the build option 'CONFIG_BOOT_CONFIG_EMBED_FILE="/PATH/TO/BOOTCONFIG/FILE"'. Previously, bootconfig was defined by attaching it to the initrd image. Embedding in the kernel allows bootconfig to be used in non-initrd configurations.
    • Implemented the ability to download firmware compressed using the Zstandard algorithm. A set of control files /sys/class/firmware/* has been added to sysfs, which allows you to initiate the download of firmware from user space.
    • A new IORING_RECVSEND_POLL_FIRST flag has been introduced in the io_uring asynchronous I/O interface, setting which will first submit a network operation for processing using polling, which can save resources in situations where it is acceptable to process the operation with some delay. Support for the socket() system call has also been added to io_uring, new flags have been proposed to simplify the management of file descriptors, a β€œmulti-shot” mode has been added for receiving several connections at once in the accept() call, operations have been added to forward NVMe commands directly to the device.
    • For the Xtensa architecture, support is provided for the KCSAN (Kernel Concurrency Sanitizer) debugging tool, designed to dynamically detect race conditions within the kernel. Also added support for sleep mode and coprocessors.
    • For the m68k architecture (Motorola 68000), a virtual machine (platform simulator) based on the Android Goldfish emulator has been implemented.
    • For the AArch64 architecture, support for the Armv9-A SME (Scalable Matrix Extension) extensions has been implemented.
    • In the eBPF subsystem, it is allowed to store typed pointers in map structures, and support for dynamic pointers has also been added.
    • A new preemptive memory reclaim mechanism has been proposed that supports user-space control via the memory.reclaim file. Writing a number to the specified file will attempt to evict the corresponding number of bytes from the set associated with the cgroup.
    • Improved memory usage accuracy when compressing data on the swap partition using the zswap mechanism.
    • Support for running 32-bit executable files on 64-bit systems was provided for the RISC-V architecture, a mode was added to bind limiting attributes to memory pages (for example, to disable caching), and the kexec_file_load() function was implemented.
    • Implementation of support for 32-bit Armv4T and Armv5 systems is adapted for use in universal multi-platform kernel builds suitable for different ARM systems.
  • Virtualization and Security
    • The EFI subsystem implements the ability to confidentially transfer secret information to guest systems without disclosing it to the host system. Data is provided through the security/coco directory in securityfs.
    • Lockdown security mode, which restricts root access to the kernel and blocks UEFI Secure Boot bypass paths, fixed a loophole that allowed bypassing security through manipulation of the kernel debugger.
    • Included patches aimed at improving the reliability and performance of the pseudo-random number generator.
    • When building with Clang 15, the kernel structure randomization mechanism is supported.
    • The Landlock mechanism, which allows you to limit the interaction of a group of processes with the external environment, has been provided with support for rules that allow you to control the execution of file renaming operations.
    • The IMA (Integrity Measurement Architecture) subsystem, designed to check the integrity of operating system components using digital signatures and hashes, has been switched to using the fs-verity module for file verification.
    • The logic of actions when disabling unprivileged access to the eBPF subsystem has been changed - previously, all commands associated with the bpf () system call were disabled, and starting from version 5.19, access to commands that do not lead to the creation of objects was retained. With this behavior, a privileged process is required to load a BPF program, but unprivileged processes can then interact with the program.
    • Added support for the AMD SEV-SNP (Secure Nested Paging) extension, which provides secure work with nested memory page tables and protects against "undeSErVed" and "SEVerity" attacks on AMD EPYC processors, which allow bypassing the AMD SEV (Secure Encrypted Virtualization) protection mechanism.
    • Added support for the Intel TDX (Trusted Domain Extensions) mechanism, which allows blocking third-party access attempts to encrypted virtual machine memory.
    • The virtio-blk driver used to emulate block devices has added support for I/O using polling, which, according to tests, has reduced latency by about 10%.
  • Network subsystem
    • The composition includes a series of BIG TCP patches that allow increasing the maximum packet size of a TCP packet to 4GB to optimize the operation of high-speed internal networks of data centers. This increase in packet size with a 16-bit header field is achieved by implementing "jumbo" packets that have the IP header size set to 0 and the actual size transmitted in a separate 32-bit field in a separate attached header. In performance testing, setting the packet size to 185 KB allowed us to increase throughput by 50% and significantly reduce data transfer delays.
    • Work continued on integration into the network stack of tools for tracking the reasons for dropping packets (reason codes). The reason code is sent during the release of the memory associated with the packet, and allows you to consider situations such as discarding the packet due to errors in filling fields in the header, detection of spoofing by the rp_filter filter, bad checksum, out of memory, triggering of IPSec XFRM rules, bad sequence number TCP etc.
    • Added support for fallback of MPTCP (MultiPath TCP) connections to plain TCP, in situations where certain MPTCP features cannot be used. MPTCP is an extension of the TCP protocol for organizing the operation of a TCP connection with the delivery of packets simultaneously along several routes through different network interfaces bound to different IP addresses. Added an API for managing MPTCP streams from user space.
  • Equipment
    • More than 420k lines of code have been added related to the amdgpu driver, of which about 400k lines are automatically generated header files with data for ASIC registers in the AMD GPU driver, and another 22.5k lines provide the initial implementation of AMD SoC21 support. The total driver size for AMD GPUs has exceeded 4 million lines of code. In addition to SoC21, the AMD driver includes support for SMU 13.x (System Management Unit), updated support for USB-C and GPUVM, and prepared to support the next generations of RDNA3 (RX 7000) and CDNA (AMD Instinct) platforms.
    • The i915 (Intel) driver has enhanced power management capabilities. Added IDs for Intel DG2 (Arc Alchemist) GPUs used on laptops, provided initial support for the Intel Raptor Lake-P (RPL-P) platform, added information about Arctic Sound-M graphics cards, implemented ABI for compute engines, added for DG2 cards support for Tile4 format, DisplayPort HDR support for systems based on Haswell microarchitecture.
    • In the Nouveau driver, the transition to the use of the drm_gem_plane_helper_prepare_fb handler has been made, static memory allocation has been applied for some structures and variables. As for the use of source code for kernel modules in Nouveau open by NVIDIA, the work so far is limited to identifying and eliminating bugs. In the future, the published firmware is planned to be used to improve the performance of the driver.
    • Added a driver for the NVMe controller used in Apple computers based on the M1 chip.

At the same time, the Latin American Free Software Foundation formed a variant of the completely free kernel 5.19 - Linux-libre 5.19-gnu, cleared of firmware and driver elements that contain non-free components or code sections, the scope of which is limited by the manufacturer. The new release cleans up drivers for pureLiFi X/XL/XC and TI AMx3 Wkup-M3 IPC. Updated blob cleanup code in Silicon Labs WFX drivers and subsystems, AMD amdgpu, Qualcomm WCNSS Peripheral Image Loader, Realtek Bluetooth, Mellanox Spectrum, Marvell WiFi-Ex, Intel AVS, IFS, pu3-imgu drivers. Implemented processing of Qualcomm AArch64 devicetree files. Added support for the new Sound Open Firmware component naming scheme. Stopped cleaning the ATM Ambassador driver removed from the kernel. Moved blob cleanup control in HDCP and Mellanox Core to separate kconfig tags.

Source: opennet.ru

Add a comment