After two months of development, Linus Torvalds presented the release of the Linux kernel 6.13. Among the most notable changes: lazy preemption mode in the task scheduler, support for atomic writes in XFS and Ext4, the "multigrain timestamps" mechanism, adaptive mode for enabling polling in the network subsystem, the ability to build with AutoFDO optimizations, support for the ARM65 Guarded Control Stack protection mechanism, isolation of virtual machines using the ARM CCA extension, separate stacks in BPF, removal of ReiserFS, the virtual-cpufreq driver, netlink API net-shaper, case-sensitive tmpfs mount mode, support for POSIX extensions in SMB3, the AMD Cache Optimizer driver.
The new version includes 14172 fixes from 2086 developers, the patch size is 46 MB (the changes affected 15375 files, 598707 lines of code were added, 406294 lines were deleted). The previous release had 14607 fixes from 2167 developers, the patch size was 37 MB. About 52% of all changes presented in 6.13 are related to device drivers, about 13% of changes are related to updating code specific to hardware architectures, 11% are related to the network stack, 4% are related to file systems, and 3% are related to internal kernel subsystems.
Key innovations in kernel 6.13:
- Disk Subsystem, I/O and File Systems
- Added a "multigrain timestamps" mechanism that allows obtaining information about the time of modification or access to files with more than a millisecond accuracy, but without negatively affecting performance. Increasing the accuracy of marks leads to additional overhead due to the increased intensity of writing metadata to disk, so in the proposed implementation, more accurate marks are created not for all files, but only for those for which processes request such marks via a call to getattr().
- Added support for atomic write operations, where data larger than a sector size is written atomically on storage devices that provide this capability. Currently, atomic write is implemented for XFS, Ext4 in O_DIRECT (Direct I/O) mode, and md RAID 0/1/10.
- A new reference counting mechanism for files is proposed that provides a 3-5% performance gain in workloads with more than 255 threads.
- The implementation of the ReiserFS file system, which was deprecated the year before last, has been removed.
- Added sysctl parameter "fs.dentry-negative" to set the VFS policy of cleaning "dentry" records (internal representation of directory elements) after deleting files associated with them. For some types of workload, it is better to leave such records about deleted files, and for others - to delete them, so the kernel provides the ability to choose (by default, "dentry" are not automatically deleted).
- The statmount() system call now has a STATMOUNT_OPT_ARRAY flag to return a list of file system options as an array of null-terminated strings that do not use "\000" escapes. Support has been added for returning the file system subtype (fs_subtype, to determine whether FUSE is used), secure mount options, and the source superblock (sb_source).
- OverlayFS provides the ability to specify layers via file descriptors rather than file path names.
- The tmpfs file system now has a "casefold" mount option to make it case-insensitive and a "strict_encoding" option to block the creation of files with names containing invalid UTF-8 characters.
- A new set of system calls for managing extended file attributes is proposed: setxattrat(), getxattrat(), listxattrat(), and removexattrat(). Unlike the setxattr(), getxattr(), listxattr(), and removexattr() system calls, the new variants require specifying a file descriptor of the directory relative to which the file path is searched.
- Btrfs has a new ioctl operation, BTRFS_IOC_SUBVOL_SYNC_WAIT, which enables waiting for subvolume cleanup to complete, allowing an unprivileged user to execute the "btrfs subvolume sync" command without access to the SEARCH_TREE ioctl (useful in backup applications that clean up subvolumes). A new ioctl operation, ENCODED_READ, has been added for reading encoded data via io_uring, such as directly reading compressed extents without unpacking. Continued work on migrating to page folios. Reduced the occurrence of lock contention when searching for embedded back references and when enumerating extent buffers. Improved extent map compression efficiency.
- The EROFS (Extendable Read-Only File System), a file system designed for use on partitions accessible in read-only mode, implements the ability to use the SEEK_HOLE and SEEK_DATA options in the lseek() system call.
- F2FS now supports device aliasing, which allows you to temporarily reserve an area in F2FS to use part of a block device in another FS. After the external operation is completed, the reserved area can be returned to F2FS. For example, you can create a FS using the command "mkfs.f2fs -c /dev/vdc@vdc.file /dev/vdb", after which the contents of the /dev/vdc device will be reserved and reflected in the vdc.file file, and the /dev/vdc partition can be used for your own needs, for example, formatted for another FS. To return the reserved contents, simply delete the vdc.file file.
- XFS now supports quotas for realtime devices. Support for a metadata directory has been added, which contains all inodes with metadata.
- The FUSE mechanism has the ability to dynamically change the maximum number of pages (FUSE_MAX_MAX_PAGES) using "sysctl fs.fuse.max_pages_limit". Page folios are used in this operation.
- SMB now supports POSIX extensions to SMB3, which are needed to store special file types such as fifos, device files, and symbolic links. Added the ability to mount a partition with an alternate password used for password rotation. Added a new mount option "cifs.upcall" for defining a namespace. Provided recognition of character and block device files created in Windows NFS Server. Added support for WSL (Windows Subsystem for Linux) style symbolic links.
- The UBIFS, ADFS, BEFS, HFS, HFSPLUS, HPFS, JFS and ECRYPTFS file systems have been migrated to use the new partition mounting API.
- File systems ECRYPTFS, UFS and NILFS2 have been converted to use page folios.
- Memory and system services
- The scheduler implements a lazy preemption model (PREEMPT_LAZY), which corresponds to the full preemption model for realtime tasks (RR/FIFO/DEADLINE), but delays the preemption of normal tasks (SCHED_NORMAL) until the tick boundary. This delay reduces the number of lock holders preempted, which brings the performance closer to configurations using the voluntary preemption model. Thus, the new model allows preserving the full preemption capabilities for realtime tasks, but minimizes the performance penalty for normal tasks. In addition, the new model simplifies the logic of task preemption operations in the kernel by excluding processors located in other kernel components (outside the task scheduler) from the scheduling process.
- When building with the Clang compiler, the ability to use optimizations is provided
Source: opennet.ru
