Linux 5.15 kernel release

After two months of development, Linus Torvalds has unveiled the release of the Linux 5.15 kernel. Notable changes include: new NTFS driver with write support, ksmbd module with SMB server implementation, DAMON subsystem for memory access monitoring, real-time locking primitives, fs-verity support in Btrfs, process_mrelease system call for shortage response systems memory, remote certification module dm-ima.

The new version received 13499 fixes from 1888 developers, the patch size is 42 MB (the changes affected 10895 files, 632522 lines of code were added, 299966 lines were deleted). About 45% of all changes introduced in 5.15 are related to device drivers, approximately 14% of changes are related to updating code specific to hardware architectures, 14% are related to the networking stack, 6% are related to file systems, and 3% are related to internal kernel subsystems.

Main innovations:

  • Disk Subsystem, I/O and File Systems
    • The kernel adopts a new implementation of the NTFS file system, discovered by Paragon Software. The new driver can work in write mode and supports all the features of the current version of NTFS 3.1, including extended file attributes, access lists (ACLs), data compression mode, effective work with voids in files (sparse), and playback of changes from the log to restore integrity after crashes .
    • The Btrfs file system implements support for the fs-verity mechanism, which is used to transparently control the integrity and authenticity of individual files using cryptographic hashes or keys associated with files stored in the metadata area. Previously, fs-verity was available only for Ext4 and F2fs file systems.

      Btrfs also adds support for mapping user IDs for mounted file systems (previously supported for FAT, ext4, and XFS file systems). This feature allows you to match the files of a specific user on a mounted foreign partition with another user on the current system.

      Other changes to Btrfs include: speeding up adding keys to a directory index to improve file creation performance; the ability to work raid0 with one device, and raid10 with two (for example, in the process of reconfiguring the array); "rescue=ibadroots" option to ignore incorrect extent tree; speeding up the "send" operation; reduce lock conflicts during rename operations; the ability to use 4K sectors on systems with a page size of 64K memory.

    • The ability to use dates after 2038 in the FS has been stabilized in XFS. Implemented a delayed inode deactivation mechanism and support for delayed setting and removal of file attributes. In order to eliminate problems, the ability to disable disk quotas for already mounted partitions has been removed (you can forcefully disable quotas, but the calculation associated with them will continue, so remounting is required to fully disable them).
    • In EXT4, work has been done to improve the performance of writing delalloc buffers and processing orphaned (orphan) files that continue to exist due to the fact that they remain open, but turned out to be without reference to a directory. The handling of discard operations has been moved out of the kthread jbd2 thread to avoid blocking metadata operations.
    • In F2FS, the option "discard_unit=block|segment|section" has been added to bind discard operations (marking freed blocks that can no longer be physically stored) to alignment relative to a block, sector, segment, or section. Added support for tracking I/O latency changes.
    • The EROFS (Extendable Read-Only File System) file system adds support for direct I/O for files saved without compression, as well as support for fiemap.
    • OverlayFS now correctly handles the immutable, append-only, sync, and noatime mount flags.
    • NFS has improved handling of situations where the NFS server has stopped responding. Added the ability to mount from an already used server, but available through a different network address.
    • Preparations for rewriting the FSCACHE subsystem have begun.
    • Added support for EFI partitions with non-standard placement of GPT tables.
    • The fanotify mechanism implements a new flag, FAN_REPORT_PIDFD, which causes pidfd to be included in the returned metadata. Pidfd helps handle PID reuse situations to more accurately identify processes accessing monitored files (pidfd is associated with a specific process and does not change, while a PID can be associated with another process after the current process associated with that PID terminates).
    • Added the ability to add mount points to existing shared groups in the move_mount() system call, which solves CRIU process state saving and restoring issues when there are multiple mount spaces shared in isolated containers.
    • Added protection against hidden race conditions that could potentially corrupt files when reading from the cache while processing voids in a file.
    • Mandatory file locks are no longer supported, implemented by blocking system calls that modify a file. Due to possible race conditions, these locks were considered unreliable and were deprecated many years ago.
    • Removed the LightNVM subsystem, which allowed direct access to the SSD drive, bypassing the emulation layer. LightNVM lost its meaning after the advent of NVMe standards that provide for the possibility of zoning (ZNS, Zoned Namespace).
  • Memory and system services
    • The DAMON (Data Access MONitor) subsystem has been implemented, which allows you to monitor the activity associated with accessing data in RAM in relation to the selected process running in user space. The subsystem allows you to analyze which areas of memory the process has accessed for the entire time of its operation, and which areas of memory have remained unclaimed. Among the features of DAMON are low CPU load, low memory consumption, high accuracy, and predictable fixed overhead that does not depend on size. The subsystem can be used both by the kernel to optimize memory management, and by user-space utilities to understand what a process is doing and optimize memory usage, such as reclaiming memory for the system.
    • The process_mrelease system call has been implemented to speed up the process of releasing the memory of a process that is ending its execution. Under normal circumstances, resource release and process termination are not instantaneous and can be delayed for various reasons, which interferes with user-space early memory response systems such as oomd (provided by systemd) and lmkd (used by Android). By calling process_mrelease , such systems can more predictably initiate memory reclaims from forcibly terminated processes.
    • From the PREEMPT_RT kernel branch, which develops support for real-time operation, the variants of primitives for organizing mutex, ww_mutex, rw_semaphore, spinlock and rwlock locks based on the RT-Mutex subsystem were transferred. Changes have been added to the SLUB slab allocator to improve performance in PREEMPT_RT mode and reduce the impact on interrupts.
    • Support for the task scheduler attribute SCHED_IDLE has been added to cgroup, which allows to provide this attribute to all the processes of the group included in a certain cgroup at once. Those. these processes will be started only when there are no other tasks in the system waiting to be executed. Unlike setting the SCHED_IDLE attribute to each process individually, when you bind SCHED_IDLE to a cgroup, when choosing a task to execute, the relative weight of tasks within the group is taken into account.
    • The memory consumption accounting mechanism in cgroup has been extended with the ability to track additional kernel data structures, including those created for polling, signal processing, and namespaces.
    • Added support for asymmetric scheduling of task binding to processor cores on architectures in which some CPUs allow 32-bit tasks, and some only work in 64-bit mode (for example, ARM). The new mode allows only CPUs that support 32-bit tasks to be considered when scheduling 32-bit tasks.
    • The io_uring asynchronous I/O interface now supports opening files directly in the fixed-file index table without using a file descriptor, which makes it possible to significantly speed up some types of operations, but goes against the traditional Unix process of using file descriptors to open files.

      io_uring for the BIO (Block I/O Layer) subsystem implements a new "BIO recycling" mechanism that reduces overhead in the process of managing internal memory and increases the number of processed I / O operations per second by about 10%. io_uring also adds support for the mkdirat(), symlinkat(), and linkat() system calls.

    • For BPF programs, the ability to query and process timer events is implemented. Added an iterator for UNIX sockets, and the ability to get and set socket options for setsockopt. Support for typed data has been added to BTF dumper.
    • On NUMA systems with different types of memory that differ in performance, in a situation of exhaustion of free space, the transfer of preempted memory pages from dynamic memory (DRAM) to a slower persistent memory (Persistent Memory) is implemented instead of deleting these pages. Testing has shown that this tactic generally improves performance on these systems. NUMA also implements the ability to allocate memory pages for a process from a selected set of NUMA nodes.
    • For the ARC architecture, support for three- and four-level memory page tables has been implemented, which will later make it possible to implement support for 64-bit ARC processors.
    • For the s390 architecture, the ability to use the KFENCE mechanism to detect errors when working with memory has been implemented, and support for the KCSAN race condition detector has been added.
    • Added support for indexing the list of messages output via printk(), which allows you to extract all such messages at once and track changes in user space.
    • mmap() deprecated the VM_DENYWRITE option and removed the MAP_DENYWRITE mode from kernel code, which reduced the number of situations that resulted in blocking file writes with an ETXTBSY error.
    • A new type of "Event probes" has been added to the trace subsystem, which can be attached to existing trace events by defining a custom output format.
    • When building the kernel using the Clang compiler, the inline assembler from the LLVM project is now used by default.
    • As part of a project to rid the kernel of code that produces warnings by the compiler, an experiment was carried out with the inclusion of the "-Werror" mode by default, in which compiler warnings are processed as errors. In preparation for the 5.15 release, Linus began to accept only changes that did not lead to warnings when building the kernel and enabled the build with "-Werror", but then agreed with the opinion that this decision was premature and postponed the inclusion of "-Werror" by default. The inclusion of the "-Werror" flag during compilation is controlled using the WERROR parameter, which is set to COMPILE_TEST by default, i.e. So far, it is only included in test builds.
  • Virtualization and Security
    • A new dm-ima handler has been added to Device Mapper (DM) with the implementation of a remote attestation mechanism based on the IMA (Integrity Measurement Architecture) subsystem, which allows an external service to verify the state of kernel subsystems in order to verify their authenticity. In practice, dm-ima allows you to use Device Mapper to create repositories linked to external cloud systems, in which, using IMA, the validity of the launched DM target configuration is checked.
    • prctl() has a new option, PR_SPEC_L1D_FLUSH, which causes the kernel to start flushing the L1D cache every time a context switch is made. This mode allows you to selectively implement additional protection against the use of third-party channel attacks for the most important processes, which are carried out to determine the data that has settled in the cache as a result of vulnerabilities caused by speculative execution of instructions in the CPU. The cost of enabling PR_SPEC_L1D_FLUSH (not enabled by default) is a significant performance penalty.
    • Implemented the ability to build the kernel with the addition of the "-fzero-call-used-regs=used-gpr" flag to GCC, which ensures that all registers are reset to zero before control is returned from the function. This option allows you to protect against information leakage from functions and reduce the number of blocks suitable for building ROP gadgets (Return-Oriented Programming) in exploits by 20%.
    • Implemented the ability to assemble kernels for the ARM64 architecture in the form of clients for the Hyper-V hypervisor.
    • A new framework for developing drivers "VDUSE" is proposed, which allows implementing virtual block devices in user space and using Virtio as a transport for access from guest systems.
    • Added Virtio driver for the I2C bus, which makes it possible to emulate I2C controllers in paravirtualization mode using separate backends.
    • Added Virtio driver gpio-virtio to allow guest systems to access GPIO lines provided by the host system.
    • Added ability to restrict access to memory pages for device drivers with DMA support on systems without I/O MMU (memory-management unit).
    • The KVM hypervisor has the ability to display statistics in the form of linear and logarithmic histograms.
  • Network subsystem
    • The ksmbd module has been added to the kernel with the implementation of a file server using the SMB3 protocol. The module complements the implementation of the SMB client previously available in the kernel and, unlike the user-space SMB server, is more efficient in terms of performance, memory consumption, and integration with advanced kernel features. Ksmbd is touted as a high-performance, embedded-ready extension to Samba, integrating with Samba tools and libraries as needed. From the capabilities of ksmbd stands out improved support for distributed file caching technology (SMB leases) on local systems, which can significantly reduce traffic. In the future, they plan to add support for RDMA (β€œsmbdirect”) and protocol extensions related to strengthening the reliability of encryption and verification by digital signatures.
    • The CIFS client has dropped support for NTLM and less secure authentication algorithms based on the DES algorithm and used in the SMB1 protocol.
    • In the implementation of network bridges for vlan, multicast support is implemented.
    • Support for the XDP (eXpress Data Path) subsystem has been added to the bonding driver used for aggregating network interfaces, which allows you to manipulate network packets at the stage before they are processed by the network stack of the Linux kernel.
    • The mac80211 wireless stack supports 6GHZ STA (Special Temporary Authorization) in LPI, SP and VLP modes, as well as the ability to set separate TWT (Target Wake Time) in access point mode.
    • Added support for the MCTP protocol (Management Component Transport Protocol), used for the interaction of control controllers and their associated devices (host processors, peripherals, etc.).
    • Continued integration into the core of MPTCP (MultiPath TCP), an extension of the TCP protocol for organizing the operation of a TCP connection with the delivery of packets simultaneously along several routes through different network interfaces bound to different IP addresses. The new release adds support for fullmesh addresses.
    • Handlers for network streams encapsulated in the SRv6 (Segment Routing IPv6) protocol have been added to netfilter.
    • Added sockmap support for Unix stream sockets.
  • Equipment
    • The amdgpu driver supports Cyan Skillfish APUs (equipped with GPU Navi 1x). Support for video codecs has been implemented for APU Yellow Carp. Improved support for GPU Aldebaran. Added new map IDs based on GPU Navi 24 "Beige Goby" and RDNA2. An improved implementation of virtual screens (VKMS) is proposed. Added support for AMD Zen 3 chip temperature monitoring.
    • The amdkfd driver (for discrete GPUs, such as Polaris) implements a shared virtual memory manager (SVM, shared virtual memory) based on the HMM (Heterogeneous memory management) subsystem, which allows you to use devices with their own memory management units (MMU, memory management unit), that can access main memory. In particular, with the help of HMM, you can organize a shared address space between the GPU and the CPU, in which the GPU can access the main memory of the process.
    • The i915 driver for Intel video cards extends the use of the TTM video memory manager and includes the ability to manage power consumption based on GuC (Graphics micro Controller). Preparations have begun to implement support for the Intel ARC Alchemist graphics card and the Intel Xe-HP GPU.
    • The nouveau driver implements the backlight control of eDP panels using DPCD (DisplayPort Configuration Data).
    • Added support for Adreno 7c Gen 3 and Adreno 680 GPUs to the msm driver.
    • The IOMMU driver has been implemented for the Apple M1 chip.
    • Added sound driver for systems based on AMD Van Gogh APUs.
    • The Realtek R8188EU driver has been added to the staging branch, which replaced the old version of the driver (rtl8188eu) for Realtek RTL8188EU 802.11 b/g/n wireless chips.
    • The ocp_pt driver was adopted for the PCIe board developed by Meta (Facebook) with the implementation of a miniature atomic clock and a GNSS receiver, which can be used to organize the operation of separate servers for precise time synchronization.
    • Added support for Sony Xperia 10II (Snapdragon 665), Xiaomi Redmi 2 (Snapdragon MSM8916), Samsung Galaxy S3 (Snapdragon MSM8226), Samsung Gavini/Codina/Kyle smartphones.
    • Added support for ARM SoC and NVIDIA Jetson TX2 NX Developer Kit, Sancloud BBE Lite, PicoITX, DRC02, SolidRun SolidSense, SKOV i.MX6, Nitrogen8, Traverse Ten64, GW7902, Microchip SAMA7, ualcomm Snapdragon SDM636/SM8150, Renesas R-Car H3e boards -2G/M3e-2G, Marvell CN913x, ASpeed ​​AST2600 (Facebook Cloudripper, Elbert and Fuji server boards), 4KOpen STiH418-b2264.
    • Added support for LCD panels Gopher 2b, EDT ETM0350G0DH6/ETMV570G2DHU, LOGIC Technologies LTTD800480070-L6WH-RT, Multi-Innotechnology MI1010AIT-1CP1, Innolux EJ030NA 3.0, ilitek ili9341, E Ink VB3300-KCA, Samsung ATNA33X C20, Samsung DB7430, WideChips WS2401.
    • Added LiteETH driver with support for Ethernet controllers used in LiteX software SoCs (for FPGAs).
    • A lowlatency option has been added to the usb-audio driver to control whether low-latency operation is enabled. Also added quirk_flags option to pass device-specific settings.

At the same time, the Latin American Free Software Foundation formed a variant of the completely free kernel 5.15 - Linux-libre 5.15-gnu, cleared of firmware and driver elements containing non-free components or code sections, the scope of which is limited by the manufacturer. The new release implements the output of a message to the log about the completion of the cleaning. Fixed issues with packaging using mkspec, improved support for snap packages. Removed some warnings from the processing of the firmware.h header file. Allowed output of some kinds of warnings ("format-extra-args", comments, unused functions and variables) when building in "-Werror" mode. Added cleaning of the gehc-achc driver. Updated blob cleanup code in adreno, btusb, btintel, brcmfmac, aarch64 qcom drivers and subsystems. Stopped cleaning prism54 drivers (removed) and rtl8188eu (replaced by r8188eu).

Source: opennet.ru

Add a comment