Kernel release Linux 6.16

After two months of development, Linus Torvalds released the kernel. Linux 6.16 Among the most notable changes: a driver for acceleration OpenVPN, Kexec HandOver mechanism, enabling five-level memory page tables by default for x86, removing the DCCP protocol, zloop block driver, the ability to send core dumps via a UNIX socket, support for atomic writing to XFS, offload sound processing for USB devices, optimizations in Ext4, a virtual TPM (Trusted Platform Module) driver, a full implementation of Device Memory TCP, support for unnamed pipes in io_uring, preparation for the integration of the Asahi DRM driver, the "usermode queue" mechanism in the AMDGPU driver, support for Intel TDE (Trusted Domain Extensions) and Intel APE (Advanced Performance Extensions).

The new version includes 15924 fixes from 2145 developers, the patch size is 50 MB (the changes affected 13793 files, 655451 lines of code were added, 316441 lines were removed). The previous release had 15945 fixes from 2154 developers, the patch size was 59 MB. About 45% of all changes presented in 6.16 are related to device drivers, about 16% of changes are related to updating code specific to hardware architectures, 13% are related to the network stack, 4% are related to file systems and 3% are related to internal kernel subsystems.

Key innovations in kernel 6.16:

  • Disk Subsystem, I/O and File Systems
    • Added zloop driver for creating zoned block loopback devices mounted in loop mode. The driver emulates the operation of regular block devices using several files from an existing file system (one file to store each zone). This feature can be useful for testing file systems, device mappers, and applications for support of zoned devices that use division of groups of blocks or sectors into zones, to which only sequential addition of data is allowed with updating the entire block group.
    • The XFS file system implements support for atomic writing of large portions of data - several blocks can now be written in atomic mode (either all blocks will be written successfully, or none of the blocks will be written).
    • The Ext4 file system has improved the performance of the "fast commit" mechanism. Support for large folios of memory pages for regular files has been added, which in tests increased the performance by 37% for intensive sequential I/O. Support for atomic write operations spanning multiple blocks has been added.
    • The ext2 file system driver has been declared obsolete for its support of the DAX mechanism, which provides direct access to the file system bypassing the page cache. DAX is scheduled to be removed from the ext2 driver by the end of the year. The reason cited is that the ext2 driver is considered a stable reference implementation, which does not allow the use of specific features that have not received due distribution.
    • OrangeFS, UFS, BFS and OMFS file systems have been migrated to use the new partition mounting API.
    • The vfs_cache_pressure_denom setting has been added to sysctl to control the number of entries in the dentry cache (an internal representation of directory entries) when the system is running low on memory. The higher the value, the more entries can be evicted from the cache (fewer entries will remain in the cache) when the system is running low on memory.
    • The Bcachefs FS has a "rebalance_on_ac_only" option that prohibits rebalancing and background compression when the system is powered by battery. Snapshot and device deletion operations have been sped up. Memory consumption has been reduced when mounting in read-only mode. The ability to run some crash recovery operations in the background without stopping work with the FS has been added.
    • The power management subsystem is allowed to independently freeze file systems and EFI variables for standby and hibernate modes (if the file systems are already frozen by the handler in user space, they are not re-frozen).
    • Added the ability to speed up EROFS using the QAT (QuickAssist Technology) accelerator built into Intel processors, which offers tools to speed up calculations related to compression and encryption.
    • In NFS, the maximum data chunk size for read and write operations has been increased from 1 to 4 MB (the default value is 1 MB, since not all clients support a larger size).
    • Unprivileged users who have CAP_SYS_ADMIN rights in a separate user namespace but do not have extended rights in the root namespace are now able to use the fanotify mechanism to monitor file systems for changes.
    • For file systems using the FUSE subsystem, functionality is provided to clear all cached directory entries (dentries) at once. Support for large memory page folios has been added to the FUSE subsystem.
    • OverlayFS supports the creation of data layers in unprivileged namespaces that use integrity control based on the dm-verity module. This feature allows you to combine trustworthy metadata layers with untrusted data layers processed in unprivileged namespaces.
  • Memory and system services
    • The KHO (Kexec HandOver) mechanism has been added to launch a new kernel from the old one without losing the system state. Before transferring control to the new version of the kernel, the state of the key kernel subsystems can be serialized into a memory area using KHO, which will not be affected by further operations. The new kernel, having received control, restores the serialized state back. The Live Update Orchestrator (LUO) subsystem is being developed on the basis of KHO, allowing the kernel to be rebooted without stopping the operation of devices.
    • Added CONFIG_X86_NATIVE kernel build parameter, which allows using the "-march=native" option during compilation to optimize for the capabilities of the processor on the current system.
    • Added support for the Intel Advanced Performance Extension (APX) instruction set architecture extension, which provides 16 additional general-purpose registers (in addition to the 16 currently available), allowing code to use fewer memory reads and writes to improve performance and reduce power consumption.
    • Added automatic memory allocation policy tuning mode in NUMA systems, in which all node weights are recalculated when new bandwidth information appears during boot or when hot-plugging memory.
    • The futex implementation now supports a local process hash table (local futex_hash_bucket), which, unlike the previously supported shared hash table for all processes, is local to a single process and is shared by all threads of that process. Local hash tables are used only for the PROCESS_PRIVATE futex operation. In addition, the new release adds support for the FUTEX2_NUMA and FUTEX2_MPOL options, which allow influencing the placement of futexes in memory, to place them closer to the processes that use them.
    • For x86_64 systems, permanent support for five-level memory page tables is enabled (the CONFIG_X86_5LEVEL parameter, which controlled the inclusion of five-level tables, has been removed).
    • The intel_pstate driver, which controls power consumption parameters (P-state) on systems with Intel processors, has been updated to support the operation of the Energy Aware Scheduling (EAS) task scheduler on hybrid processors that combine high-performance and energy-efficient CPU cores, such as Intel Lunar Lake.
    • Added interfaces to sysfs: "/sys/devices/system/cpu/cpuN/cpu_capacity" to get information about the capabilities of different CPUs in hybrid processors, and "/sys/devices/system/cpu/cpuidle/intel_c1_demotion" to control the ability to leave the CPU in a more powerful state even if the kernel tries to put the CPU into a lower power state (for example, the kernel might request a transition to the C6 power state, but the firmware might leave the CPU in the C1 state if the CPU is woken up heavily).
    • For the ARM64 architecture, support for the lazy preemption mode (PREEMPT_LAZY) is enabled, which corresponds to the full preemption mode for realtime tasks (RR/FIFO/DEADLINE), but delays the preemption of normal tasks (SCHED_NORMAL) until the tick boundary.
    • For the ARM64 architecture, support has been added for using SME (Scalable Matrix Extension) extensions, enabled via the CONFIG_ARM64_SME parameter.
    • Continued migrating changes from the Rust-for- branchLinux, related to using Rust as a second language for developing drivers and kernel modules (Rust support is not active by default and does not result in Rust being included among the required kernel build dependencies). The ability to use configfs has been introduced for modules written in Rust. Abstractions necessary for developing graphics drivers have been added. The capabilities of the alloc, time, str, list, workqueue, and page modules have been expanded. Support for the "assert!" macro has been added in KUnit-based tests. A set of abstractions has been added for managing CPU frequency and using APIs related to power management. Support for the 'xarray' data structure has been added.
    • The implementation of the getrandom() system call has been ported for the RISC-V architecture, optimized using the vDSO (virtual dynamic shared object) mechanism, which makes it possible to move the system call handler from the kernel to the user space and avoid context switches. In the tests conducted, the optimization accelerated the generation of random numbers by 17 times. Support for the Zicbop, Zabha, and Svinval vector extensions used in SiFive processors has also been implemented for RISC-V.
    • For the LoongArch architecture, the limit on the number of CPUs in the system has been raised from 256 to 2048. Support for the SCHED_MC (Multi-core) task scheduler has been added.
    • Added the ability to use Unix sockets to pass file descriptors. To disable this feature, applications can use the SO_PASSRIGHTS flag in setsockopt().
    • Provided the ability to map the ring buffer used for tracing kernel activity to user space memory.
    • Crash dump handlers used to generate a problem report after a kernel crash can now use the LUKS keys used by the crashed kernel to save crash dumps to encrypted FS.
    • The io_uring asynchronous I/O system has added an IORING_OP_PIPE operation for creating unnamed pipes, which is similar to the pipe2 system call except that it supports fixed file descriptors.
    • Added kernel command line option "rt_group_sched" to control whether the realtime task group scheduler (SCHED_RR) is enabled. This option is similar to the RT_GROUP_SCHED setting in Kconfig.
    • For devices based on the CXL (Compute Express Link) bus, used to organize high-speed interaction between the CPU and memory devices, support for RAS (Reliability, Availability, Serviceability) extensions has been implemented, allowing the implementation of various error detection and correction schemes. CXL allows connecting new memory areas provided by external memory devices and using them as additional resources of the physical address space to expand the system RAM (DDR) or permanent memory (PMEM).
    • The minimum GCC version required to build the kernel for all architectures has been raised to the GCC 8 branch. Building now also requires at least binutils 2.30.
    • The uselib() system call, which has long been deprecated, has been removed and mmap() is used instead for shared access to shared libraries between programs.
  • Virtualization and Security
    • Initial support has been added for the Intel TDX (Trusted Domain Extensions) mechanism to protect guest systems running under the KVM hypervisor from tampering and analysis by the host system administrator and physical attacks on the hardware. This protection is achieved through memory encryption. virtual machines.
    • Added a virtual TPM (Trusted Platform Module) driver that allows virtual machines to interact with TPM (Trusted Platform Module) devices emulated by the SVSM (Secure VM Service Module).
    • The ability to use the randstruct GCC plugin, which randomizes the layout of data structures at compile time to make exploitation more difficult, has been restored.
    • Added the ability to use IMA (Integrity Measurement Architecture) technology to check the integrity when launching new kernels using the kexec system call.
    • Work has been carried out to reduce the impact on performance of using SELinuxTo speed up performance, a cache of directory access check results has been added. The ability to use masks has been added to genfscon rules.
    • The code for interaction with EFI provides the ability to embed a SBAT (UEFI Secure Boot Advanced Targeting) section with metadata about revoked versions of boot components.
    • In loadable modules, the ".static_call_sites" section has been switched to read-only mode after initialization is complete.
    • For 64-bit ARM systems, the KVM hypervisor now supports nested virtualization (disabled by default).
    • The KVM hypervisor has announced stable support for the RISC-V architecture.
  • Network subsystem
    • The ovpn driver has been added to the package, allowing for significant performance acceleration. OpenVPN by moving encryption operations, packet processing, and communication channel management to the kernel side LinuxThe driver eliminates the overhead associated with context switches, enables optimization by directly accessing internal kernel APIs, and eliminates slow data transfers between the kernel and user space (encryption, decryption, and routing are performed by the module without sending traffic to a user-space handler). This is done on the kernel side to avoid unnecessary context switches.
    • The Device Memory TCP mechanism now supports sending data from device memory (TX path). Previously, to simplify the integration of Device Memory TCP into the kernel, the functionality was limited to only receiving data (RX path). Device Memory TCP allows using network sockets to directly send the contents of a peripheral device's memory over the network (zero-copy mode), as well as directly placing the contents of network packets in the memory area of the device on the recipient's side. The data transmitted in the packets is transferred from the network card to the memory of the peripheral device (DMABUF), for example, to the video memory of the GPU, or from the device's memory to the network card directly, bypassing the CPU, and the packet headers end up in the regular kernel buffers.
    • The ability to send core dump contents via an AF_UNIX socket has been introduced, allowing more secure core dump handlers to be created in user space that do not rely on the kernel invoking privileged processes.
    • Removed support for the DCCP (Datagram Congestion Control Protocol) network protocol, which has not gained popularity and has remained in the kernel for five years without maintenance. Removing DCCP from the kernel will remove barriers preventing the rework of the inet_connection_sock data structure to improve the efficiency of the TCP stack. Support for netfilter modules for filtering DCCP packets is retained.
    • To simplify error handling when using SO_PEERPIDFD sockets, the kernel can now pass pidfd for already terminated processes (pidfd is associated with specific processes and, unlike pid, is not reassigned).
    • Using BPF, you can now create network stack packet queue control handlers (qdiscs) to influence the order in which network packets are processed.
    • The AFS network file system uses the Generic Security Services API (GSSAPI) to manage encryption of connections to servers YFS and OpenAFS.
    • A large portion of optimizations has been introduced. The organization of locks for IPv6 routing tables has been reworked (some operations with routes are now performed up to 3 times faster). Software calculation of crc32c checksums has been accelerated. The GRO engine for tunneled UDP traffic has been accelerated by 10%. Automatic tuning of the receiving buffer for TCP has been improved and the default limits have been increased (in tests, the throughput for single streams through a 200Gbs channel has increased by 60%).
    • Netfilter now supports masks in network device names used in netdev and flowtable. Connection tracking information (conntrack) has been integrated into the nft trace infrastructure. Retrieval of connection tracking tables (conntrack) via procfs has been sped up.
  • Equipment
    • Added support for offloading audio stream processing to USB audio devices (USB audio offload). This change significantly reduces power consumption on portable devices by continuing to process the audio stream while the rest of the system is in sleep mode. Previously, this feature was present in kernels for the platform. Android Previously, a specific implementation of offload audio processing for USB devices was provided, and now the main kernel has a universal implementation that can be used by any project.
    • Continued integration of Nova driver components for NVIDIA GPUs equipped with GSP firmware used starting with the NVIDIA GeForce RTX 2000 series based on the Turing microarchitecture. The driver is written in Rust. In addition to the nova-core component added in the previous release, which implements a basic abstraction layer over the GSP firmware APIs, version 6.16 includes an initial implementation of the nova-drm (Direct Rendering Manager) DRM driver for interacting with the GPU from user space.
    • The process of promoting the Asahi DRM driver for Apple AGX GPUs used in Apple Silicon chips to the kernel has begun. The driver is written in Rust. At this stage, only the header files with UAPI from the Asahi driver, necessary for Mesa, are included in the kernel, and the main code of the Asahi driver will be integrated later.
    • The Nouveau driver adds support for NVIDIA Hopper and Blackwell GPUs.
    • Work on the Xe drm (Direct Rendering Manager) driver for GPUs based on the Intel Xe architecture, which is used in Intel Arc family graphics cards and integrated graphics starting with Tiger Lake processors, has been continued. The ability to use different firmware files for different Intel GPU families has been added.
    • The AMDGPU driver implements support for a "usermode queue" mechanism that allows you to create your own queues of work in user space and send them directly to the GPU without going through the kernel scheduler. Support for "usermode queue" is enabled for Navi 4X and GFX 12 GPUs.
    • Added support for Intel WCL (Whiskey Lake), AMD ACP 7.x (Audio Co-Processor), Cirrus Logic CS35L63 and CS48L32, Everest Semiconductor ES8375 and ES8389, Pioneer DJM-V10, Longsoon-1 AC'97, NVIDIA Tegra264, Richtek ALC203, RT9123 and Rockchip SAI sound systems, as well as new Intel AVS platforms.
    • Added support for ARM boards, SoC and devices: Samsung Exynos7870, Qualcomm Snapdragon X1P42100, Qualcomm MSM8926, RK3562, NXP i.MX94, Renesas RZ/V2N, Amlogic S6/S7/S7D, WonderMedia wm8950, Amlogic s805y, Allwinner A523, Toradex Verdin AM62P, ROCK 5B+, Nitrogen8M Plus, Retronix R-Car V4H Sparrow Hawk, MT8186 Ponyta Chromebook, VIA APC Rock/Paper, Renesas rz/t2h, ASUS Transformer Pad LTE TF300TL, LG Nexus 4, Google Pixel 4a, Raspberry Pi 2.
    • Removed driver for video capture cards on STA2X11 chips.

At the same time, the Latin American Free Software Foundation created a version of the completely free kernel 6.16 - Linux-libre 6.16-gnu, cleaned of firmware and driver elements containing non-free components or code sections with restricted scope. Release 6.16 neutralizes blob loading in the new Intel qat 6xxx crypto, ST vd55g1 sensor, ath12k AHB wifi, Aeonsemi AS21xxx, and MediaTek 25Gb Ethernet drivers. Blob names in devicetree (dts) files for Qualcomm and MediaTek ARM chips have been cleaned. Blob cleaning code has been updated in the Nova Core, Nouveau, Realtek r8169 Ethernet, Qualcomm Iris, Venus, Mediatek mt7996 wifi, Qualcomm ath11k and ath12k wifi, Texas Instruments tas2781, and Renesas R-Car gen4 PCIe drivers.

Source: opennet.ru

Buy reliable hosting for sites with DDoS protection, VPS VDS servers 🔥 Buy reliable website hosting with DDoS protection, VPS VDS servers | ProHoster