After two months of development, Linus Torvalds released the Linux kernel 6.12. Among the most notable changes: the ability to enable Realtime mode, sched_ext for creating CPU schedulers via eBPF, QR code output in emergency conditions, Device Memory TCP mechanism, SCHED_DEADLINE server resource reservation mechanism, improved EEVDF task scheduler, IPE module for setting integrity policies.
The new version includes 14607 fixes from 2167 developers, the patch size is 37 MB (the changes affected 13087 files, 507913 lines of code were added, 234083 lines were deleted). The previous release had 15130 fixes from 2078 developers, the patch size was 85 MB (in kernel 6.10 the patch was 41 MB). About 45% of all changes presented in 6.12 are related to device drivers, about 12% of changes are related to updating code specific to hardware architectures, 13% are related to the network stack, 6% are related to file systems and 3% are related to internal kernel subsystems.
Key innovations in kernel 6.12:
- Memory and system services
- The ability to build the kernel with the PREEMPT_RT option without additional patches for working in real-time mode has been provided. The last missing feature in the kernel that did not allow activating the PREEMPT_RT mode was support for non-blocking atomic output via the printk function, which is also accepted into the kernel. PREEMPT_RT support is available for x86, x86_64, ARM64 and RISC-V architectures. Until now, the implementation of the PREEMPT_RT mode was supplied in the form of external patches, based on which some distributions, such as RHEL, SUSE and Ubuntu, created separate Realtime editions of their products, in demand in such areas as financial systems, audio and video processing devices, aviation, medicine, robotics, telecommunications and industrial systems, where it is necessary to ensure predictable event processing time.
- The "sched_ext" (SCX) mechanism has been added, allowing eBPF to be used to create CPU schedulers that cover virtually all aspects of task scheduling and CPU resource allocation. Such schedulers can be dynamically loaded and executed within the Linux kernel. virtual machine eBPF. The sched_ext mechanism simplifies the creation of task-specific schedulers, enables experimentation with various scheduling techniques and strategies, and allows for the rapid creation of working prototypes and the on-the-fly replacement of schedulers in production infrastructures. For example, using sched_ext, you can create a scheduler that takes into account the specifics of a specific application and dynamically changes its scheduling strategy depending on the system state and other factors.
- The rest of the patches required for the SCHED_DEADLINE server mechanism to work are included. This mechanism solves the problem of regular tasks not receiving enough CPU resources when high-priority (realtime) tasks monopolize the CPU. To prevent CPU monopolization, the kernel previously used the Realtime throttling mechanism, which tried to reserve 5% for low-priority tasks, leaving 95% of the time for realtime tasks. This mechanism left much to be desired, since regular tasks in many situations did not receive enough CPU time. The SCHED_DEADLINE server implements a more efficient resource reservation mechanism.
- The integration of the EEVDF (Earliest Eligible Virtual Deadline First) task scheduler has been completed. It replaces the CFS (Completely Fair Scheduler) scheduler shipped since kernel 2.6.23. When choosing the next process to transfer execution to, the new scheduler takes into account processes that have not received enough processor resources or have received an undeservedly large amount of processor time. In the first case, control is forced to transfer to the process, and in the second, on the contrary, it is postponed. The old CFS scheduler used heuristics and fine-tuning to determine processes that require special attention, while the new scheduler tracks them more explicitly and does not require fine-tuning. It is assumed that EEVDF will reduce delays in the execution of tasks with which CFS had problems with scheduling.
- The kernel's DRM Panic crash handler, which uses the DRM (Direct Rendering Manager) subsystem to display a visual report in the style of a "blue screen of death", now has the ability to display a logo and a QR code with a kmsg report on the screen when an emergency condition occurs. Since only 2953 bytes fit in a QR code, the DRM_PANIC_SCREEN_QR_CODE_URL option is provided, in which the kmsg report is compressed using zlib and attached as a parameter to the URL, which allows about 40 bytes to be transferred via a V7500 QR code. When building packages with the kernel, distributions can specify a base link for the URL, which will allow a transition to a page for sending a problem message. The DRM_PANIC_SCREEN_QR_VERSION setting is provided to select the QR code format.
- Added support for the ARM POE (Permission Overlay Extension), which allows you to set access rights to memory areas. Using this extension, the Memory Protection Keys mechanism can be implemented on systems with ARM64 processors, which is used to restrict access to memory pages without changing the memory page table.
- For Loongarch, ARM64, PowerPC and s390 architectures, the implementation of the getrandom() system call has been ported, optimized using the vDSO (virtual dynamic shared object) mechanism, which makes it possible to move the system call handler from the kernel to the user space and avoid context switches. Optimization allows up to 15 times faster random number generation.
- The io_uring asynchronous input/output subsystem now supports absolute timeouts that are triggered when a certain time is reached on the system clock (previously, only relative timeouts could be set, which specified the duration from the start of the operation).
- Added files for generating bindings for the libcpupower library using the SWIG toolkit, which allows generating bindings from C/C++ code for various programming languages. Bindings allow creating scripts in Python and other languages, and using them to extend the functionality of the libcpupower library, which provides an API for managing cpufreq and drivers from user space.
- The cpuidle utility implements the display of the "residency" idle state value, which is used for real-time systems and takes into account the minimum time that the processor must be in the idle state to justify the energy spent on entering and exiting this state.
- Added the ability to use the Clang compiler to build the standard C library nolibc, which is part of the Linux kernel source code and provides a wrapper around basic system calls. When building nolibc in Clang, it is possible to use link-time optimization (LTO).
- Several cgroup1 interfaces have been deprecated, such as TCP accounting, the first version of soft limits, and memory exhaustion management. These features are still fully supported, and the warning is made to study the number of users who continue to use these features.
- Added the ability to configure a circular trace buffer to save accumulated data after reboot, which will prevent the loss of accumulated debugging information in the event of a kernel crash. The data is saved in memory. Enabling is done via the kernel command line parameter trace_instance, for example, setting "trace_instance=boot_map@0x285400000:12M" will reserve 12 MB of memory at address 0x285400000 for the "boot_map" buffer, which will be accessible via the file /sys/kernel/tracing/instances/boot_map.
- Continued porting changes from the Rust-for-Linux branch related to using Rust as a second language for developing drivers and kernel modules (Rust support is not active by default, and does not include Rust in the list of mandatory build dependencies for the kernel). Added the 'list' and 'rbtree' modules for working with doubly linked lists and red-black search trees. Expanded the capabilities of the 'init', 'sync', 'types' and 'error' modules. The ability to use Rust code when building a kernel with protection against Spectre attacks (MITIGATION_{RETHUNK,RETPOLINE,SLS} options), using the KASAN debugging system, kCFI (kernel Control Flow Integrity) and Shadow Call protection mechanisms, as well as when using additional GCC plugins. Added a driver for the Applied Micro QT2025 PHY Ethernet controller, written in Rust. A separate website with documentation has been prepared: rust.docs.kernel.org.
- The xdrgen utility has been added to the kernel source code for converting XDR (eXternal Data Representation) specifications into XDR encoding and decoding functions written using the C style adopted by the Linux kernel.
- The kernel has a change to implement a pointer masking mechanism to reduce the number of slow barrier_nospec() calls in the 64-bit copy_from_user() function used to copy data from user space to the kernel. Using masking speeds up the per_thread_ops test, which measures the number of operations that can be performed in a single thread, by 2.6%.
- Added a new USB driver that allows using the 9pfs protocol as a transport for transmitting and receiving data from a USB device when mounting a 9p FS over USB (e.g. mount -t 9p -o trans=usbg,aname=/path/to/fs /mnt/9"). An example of the new driver's application is the use of root partition booting instead of NFS when developing embedded devices.
- Disk Subsystem, I/O and File Systems
- The VFS subsystem now supports working with storage devices whose block size is larger than the system memory page size. In file systems, this feature is currently supported only in XFS.
- The FUSE subsystem, which allows creating file system implementations that run in user space, has been updated to support mapping of user IDs of mounted file systems, used to match files of a specific user on a mounted foreign partition with another user on the current system.
- A new fcntl operation F_CREATED_QUERY has been implemented, which allows an application to determine whether a file opened using the O_CREAT flag was created or whether it already existed.
- The name_to_handle_at() system call has been updated to allow the use of unique 64-bit mount point identifiers to avoid a race condition when parsing /proc/mountinfo.
- The size of the kernel "file" structure has been reduced from 232 to 184 bytes, reducing memory consumption on systems that actively work with files.
- Mounting filesystems to mount points within the /proc hierarchy, such as /proc/PID/fd, was prohibited, creating potential security issues.
- The pseudo-FS NSFS (NameSpace FS), used to work with namespaces, implements the provision of additional information about the namespaces of mount points.
- The EROFS (Extendable Read-Only File System), a file system designed for use on partitions accessible in read-only mode, now supports mounting file systems directly from disk images saved as files.
- XFS adds new ioctl commands XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE to exchange contents between two files.
- NFS now supports the "LOCALIO" protocol, which allows determining whether the NFS client and server are on the same host to enable appropriate optimizations.
- The Btrfs file system has been optimized for performance, refactored, reduced the extent lock area for read operations, continued work on switching to the use of page folios, and implemented automatic memory release for the btrfs_path structure.
- The Ext4 file system has been updated to fix bugs related to block allocation, extent management, fast commit, and journaling.
- Virtualization and Security
- Added the IPE (Integrity Policy Enforcement) LSM module, developed by Microsoft to extend the existing mandatory access control system. The module allows you to define a general integrity policy for the entire system, indicating which operations are allowed and how the authenticity of components should be verified. For example, using IPE, you can specify which executable files are allowed to run, taking into account their compliance with the reference version using cryptographic hashes provided by the dm-verity system.
- At the kernel compilation stage, it is now possible to separately enable available protection methods against different Spectre class vulnerabilities in the CPU. New parameters are offered in Kconfig: MITIGATE_MDS (protection against Microarchitectural Data Sampling vulnerability), MITIGATE_TAA (protection against TSX Asynchronous Abort vulnerability), MITIGATE_MMIO_STALE_DATA (protection against MMIO Stale Data vulnerability), MITIGATE_L1TF (protection against L1 Terminal Fault vulnerability), MITIGATE_RETBLEED (protection against Retbleed vulnerability), MITIGATE_SPECTRE_V1, MITIGATE_SPECTRE_V2 (protection against Spectre vulnerabilities), MITIGATE_SRBDS (protection against Special Register Buffer Data Sampling vulnerability), MITIGATE_SSB (protection against Speculative Store Bypass vulnerability).
- Added command line option proc_mem.force_override and a set of build settings in Kconfig (PROC_MEM_FORCE_ALWAYS, PROC_MEM_FORCE_PTRACE and PROC_MEM_FORCE_NEVER) to prevent memory modification via /proc/pid/mem.
- The LSM (Linux security module) subsystem has been switched to using static calls, which has improved security and performance.
- The ability to use standard kernels for the ARM64 architecture in guest environments running on Android systems with a modified KVM hypervisor (protected KVM) has been provided.
- The Landlock LSM module, which allows you to restrict the interaction of a group of processes with the external environment, implements the concept of "IPC scoping" to selectively restrict interaction with sandbox environments using Unix sockets and signals. For example, you can prohibit the establishment of connections using Unix sockets from a sandbox environment to processes that do not use isolation, but allow connections to processes in the same scope.
- In the KVM hypervisor, a flag has been added to the CPUID for guests to indicate support for AVX10.1 extensions.
- Network subsystem
- Added the Device Memory TCP mechanism, which allows using network sockets to directly send the contents of peripheral device memory over the network (zero-copy mode) and directly place the contents of network packets in the memory area of the device on the recipient side. The data transmitted in the packets is transferred from the network card to the memory of the peripheral device or from the device memory to the network card directly, bypassing the CPU, and the packet headers end up in the usual kernel buffers.
- The capabilities of many Ethernet and wireless drivers have been expanded. For example, the Intel iwlwifi driver now supports offloading RLC/SMPS operations to the firmware, the RealTek rtw89 driver has improved performance and added support for RTL8852BT/8852BE-VT (WiFi 6) chips, the microchip Ethernet driver now supports IEEE 802.3bw (100BASE-T1) and IEEE 802.3bp specifications, and the implementations of Microsoft vNIC and IBM veth virtual Ethernet have been improved. New drivers have been added for the Realtek RTL9054, RTL9068, RTL9072, RTL9075, RTL9068, RTL9071 and Microchip LAN8650/1 10BASE-T1S MAC-PHY Ethernet chips.
- In MPTCP (MultiPath TCP), an extension of the TCP protocol for organizing the delivery of TCP packets simultaneously along several routes through different network interfaces, the size of the weight coefficients used in routing has been increased from 8 to 16 bits. Blackhole traffic detection and temporary suspension of attempts to establish connections with systems that cause traffic loss have been implemented.
- For IPv6, support has been implemented for the "p" flag in the PIO (Prefix Information Option) used in RA (IPv6 Router Advertisements) to select the client deployment model via DHCPv6-PD (DHCPv6 Prefix Delegation, RFC9663) instead of assigning individual addresses based on prefixes, using SLAAC (Stateless Address Autoconfiguration). IPv6 IOAM6 adds support for the new tunsrc encapsulation mode, which allows achieving higher performance.
- Improved performance of IPsec control packet processing.
- Improved performance of flushing large nftables rulesets. Improved SCTP support in nfnetlink_queue.
- The ethtool API has been updated to support binding multiple network cards to a single network interface.
- Equipment
- The AMDGPU driver continues to support AMD RDNA4 GPUs ("GFX12"). Added the ability to reset individual task queues without resetting the entire GPU.
- Work on the Xe drm (Direct Rendering Manager) driver for GPUs based on the Intel Xe architecture, which is used in Intel Arc family graphics cards and integrated graphics, starting with Tiger Lake processors, has continued. The new version includes support for GPUs based on the Battlemage and Lunar Lake microarchitectures. Support for Xe2 CCS (Color Control Surface) modifiers for managing the parameters of integrated and discrete GPUs has been introduced.
- The i915 driver now allows outputting information about the fan speed via the HWMON or sysfs interface (the "fan1_input" attribute). The "i915.modeset" parameter has been declared obsolete; instead of "i915.modeset=0", the "i915.nomodeset" parameter should be used.
- Added support for A615, A306 and A621 GPUs to msm (GPU Qualcomm Adreno) DRM driver.
- The Nouveau driver has undergone reworking and cleaning of its internal structures.
- The intel_pstate driver, which controls power consumption parameters (P-state) on systems with Intel processors, now supports hybrid systems with asymmetric (different in characteristics) CPUs, as well as supports power management for processors based on the Granite Rapids and Sierra Forest microarchitectures. The intel_idle driver now supports the Xeon Granite Rapids CPU. The intel_rapl driver now recognizes AMD 1Ah family processes and Intel ArrowLake-U processors.
- Continued to include changes to support the Snapdragon X Elite ARM SoC, which uses Qualcomm's own 12-core Oryon CPU and Qualcomm's Adreno GPU. The chip is aimed at laptops and PCs, and outperforms Apple's M3 and Intel's Core Ultra 155H in many performance tests.
- Added support for ARM boards, SoC and devices: Broadcom bcm2712 (Raspberry Pi 5), Renesas R9A09G057 (RZ/V2H), Qualcomm Snapdragon 414 (MSM8929), Lenovo ThinkPad T14s Gen 6, Lenovo A6000/A6010, Surface Laptop 7, Anbernic RG35XXSP, Firefly Core-PX30-JD4, Lunzn Fastrhino R68S, Aspeed Riser, AGX Orin, Rockchip Qnap-TS433, Huashan Pi, Meta Catalina, BeagleY-AI, NanoPi R2S Plus, ExynosAuto v920, SOPHGO SG2002, Qualcomm IPQ5332, LG G4 (h815), Cool Pi CM5 GenBook, Anbernic RG35XXSP, GameForce Ace, IBM P11, Kontron i.MX93 OSM-S, NanoPC-T6
- Added support for Anbernic RG28XX, On Tat Industrial Company KD50G21-40NT-A1, Innolux G070ACE-LH3, Melfas lmfbx101117480, Densitron DMT028VGHMCMI-1D, Microchip AC40T08A, AOU B116XTN02.3, AUO screen panels B116XAN06.1, AOU B116XAT04.1, BOE TV101WUM-LL2, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, BOE NE140WUM-N6G, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4, Starry er88577.
- The sound subsystem now supports the following chips and codecs: RME Digiface USB, AMD ACP 7.1, Mediatek MT6367, MT8365, Realtek RTL1320, C-Media CM9825. Old sound drivers for ASoC Intel are declared obsolete, and it is recommended to use AVS drivers instead. Many improvements have been made to the SoundWire driver.
Source: opennet.ru
