Linux 5.7 kernel release

After two months of development Linus Torvalds presented kernel release Linux 5.7. Notable changes include: new exFAT FS implementation, bareudp module for creating UDP tunnels, pointer authentication based security for ARM64, ability to attach BPF programs to LSM handlers, new Curve25519 implementation, split-lock detector, BPF compatibility with PREEMPT_RT, removal of the 80-character line size limit in the code, accounting for CPU temperatures in the task scheduler, the ability to use clone() to spawn processes in another cgroup, memory write protection using userfaultfd.

The new version accepted 15033 fixes from 1961 developers,
patch size - 39 MB (changes affected 11590 files, 570560 lines of code added,
297401 lines removed). About 41% of all presented in 5.7
changes are associated with device drivers, approximately 16% of changes have
attitude towards updating code specific to hardware architectures, 13%
related to the network stack, 4% to file systems, and 4% to internal
kernel subsystems.

All innovations:

  • Disk Subsystem, I/O and File Systems
    • Added a new implementation of the exFAT driver, founded based on the current "sdfat" (2.x) codebase developed by Samsung for its Android smartphones. The previously added driver to the kernel was based on outdated Samsung code (version 1.2.9) and lagged about 10% behind the new driver in performance. Recall that the addition of exFAT support to the kernel became possible after Microsoft ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π»Π° publicly available specifications and made it possible to use the exFAT patents on Linux royalty-free.
    • Btrfs has implemented a new ioctl() command, BTRFS_IOC_SNAP_DESTROY_V2, which allows deleting a subsection by its ID. Full support for cloning inline extents is provided. Expanded the number of redistribution cancel points to reduce long waits when executing the 'balance cancel' command. Determination of back links to extents has been accelerated (for example, the execution time of a test script has decreased from an hour to several minutes). Added the ability to attach a file's extent tree to each inode. Redesigned the locking scheme used when writing to subkeys and when throwing a NOCOW. Improved performance of fsync for ranges.
    • XFS has improved metadata checking and fsck execution for active partitions. A library for rebuilding btree structures has been proposed, which in the future will be used to rework xfs_repair and implement the ability to restore without unmounting the partition.
    • CIFS has added experimental support for placing a swap partition on SMB3 storages. Implemented POSIX extensions to readdir as defined in the SMB3.1.1 specification. Improved write performance for 64KB pages when cache=strict mode is enabled and protocol versions 2.1+ are used.
    • FS EXT4 switched from bmap and iopoll to use iomap.
    • F2FS has optional support for data compression using the zstd algorithm. By default, the LZ4 algorithm is used for compression. Added support for "chattr -c commit" command. Display of mount time is provided. Added ioctl F2FS_IOC_GET_COMPRESS_BLOCKS to get information about the number of compressed blocks. Added compression data output via statx.
    • In the Ceph file system, the ability to locally perform file creation and deletion (unlink) operations without waiting for a response from the server (working in asynchronous mode) has been added. The change, for example, allows you to noticeably improve performance when running the rsync utility.
    • OVERLAYFS added the ability to use virtiofs as a top-level file system.
    • rewritten path traversal code in VFS, symbolic link parsing code redesigned and mount point traversal unified.
    • In the scsi subsystem to unprivileged users allowed execution of ZBC commands.
    • In dm_writecache implemented the ability to gradually clear the cache based on the max_age parameter, which sets the maximum block lifetime.
    • In dm_integrity added support for the "discard" operation.
    • In null_blk added support for error substitution to simulate failures during testing.
    • Added by the ability to send udev notifications about block device size changes.
  • Network subsystem
    • Netfilter included changes, which significantly speed up the processing of large mapping lists (nftables set), which require checking a combination of subnets, network ports, protocol, and MAC addresses.
      Optimizations introduced to the nft_set_pipapo (PIle PAcket POlicies) module, which solves the problem of matching the contents of a packet with arbitrary field state ranges used in filtering rules, such as IP and network port ranges (nft_set_rbtree and nft_set_hash manipulate interval matching and direct reflection of values). A version of pipapo vectorized with 256-bit AVX2 instructions on a system with an AMD Epyc 7402 processor showed a performance increase of 420% when parsing 30 records that included port-protocol bindings. The increase in matching a link from a subnet and a port number when parsing 1000 entries was 87% for IPv4 and 128% for IPv6.

    • Added bareudp module, which allows various L3 protocols such as MPLS, IP and NSH to be encapsulated in a UDP tunnel.
    • The integration of MPTCP (MultiPath TCP) components, an extension of the TCP protocol for organizing the operation of a TCP connection with the delivery of packets simultaneously along several routes through different network interfaces bound to different IP addresses, has continued.
    • Added by support for hardware acceleration mechanisms for Ethernet frame encapsulation in 802.11 (Wi-Fi).
    • When moving a device from one network subsystem namespace (network namespace) to another, the access rights and the owner of the corresponding files in sysfs are adjusted.
    • Enabled the SO_BINDTODEVICE flag to be used by non-root users.
    • The third batch of patches has been adopted, moving the ethtool toolkit from ioctl () to use the netlink interface. The new interface simplifies the addition of extensions, improves error handling, allows notifications to be sent on state changes, simplifies interaction between the kernel and user space, and reduces the number of named lists that are synchronized.
    • Added the ability to use special hardware accelerators to perform connection tracking operations.
    • In netfilter added a hook for connecting classifiers of outgoing packets (egress), which supplemented the previously present hook for incoming packets (ingress).
  • Virtualization and Security
    • Added hardware implementation of pointer authentication (Pointer Authentication), which uses specialized ARM64 CPU instructions to protect against attacks using return-oriented programming (ROP) techniques, in which the attacker does not try to place his code in memory, but operates with pieces of machine instructions already in the loaded libraries, ending with a control return instruction. Security comes down to using digital signatures to verify return addresses at the kernel level. The signature is stored in the unused top bits of the pointer itself. Unlike software implementations, the creation and verification of digital signatures is performed using special CPU instructions.
    • Added by the ability to protect a memory area from writing using the userfaultfd () system call, designed to handle page faults (referring to unallocated memory pages) in user space. The idea is to use userfaultfd() to both track violations of access to pages marked write-protected and call a handler that can respond to such write attempts (for example, to handle changes in the process of creating live snapshots of running processes, fixing state when flushing memory dumps to disk, implementing shared memory, tracking changes in memory). Functionality is equivalent to using mprotect() in conjunction with the SIGSEGV signal handler, but is noticeably faster.
    • SELinux has deprecated the "checkreqprot" option to disable memory protection checks during rule processing (allowing the use of executable memory areas, regardless of the prescriptions specified in the rules). Kernfs symlinks are allowed to inherit the context of their parent directories.
    • The composition included module KRSI, which allows you to attach BPF programs to any LSM hooks in the kernel. The change allows creating LSM modules (Linux Security Module) in the form of BPF programs for solving audit and mandatory access control tasks.
    • Carried out /dev/random performance optimization by batching CRNG values ​​instead of calling RNG instructions separately. Improved getrandom and /dev/random performance on ARM64 systems that provide RNG instructions.
    • Elliptic curve implementation Curve25519 replaced to the version from the library HACL, for which given mathematical proof of formal reliability verification.
    • Added mechanism for informing about free memory pages. Using this mechanism, guests can send the host system information about pages that are no longer in use, and the host can take back the page data.
    • vfio/pci added support for SR-IOV (Single-Root I/O Virtualization).
  • Memory and system services
    • C 80 to 100 characters increased restriction on the maximum length of a line in source texts. At the same time, developers are still encouraged to stay within the boundaries of 80 characters per line, but this is no longer a hard limit. Also, exceeding the line size limit will now only generate a build warning if the checkpatch utility is run with the '--strict' option. The change will make it possible not to distract developers for manipulation with spaces and feel more free to align code as well will prevent excessive line splitting, interfering code perception and search.
    • Added by support for EFI mixed boot mode, which allows loading a 64-bit kernel from a 32-bit firmware running on a 64-bit CPU without using a specialized bootloader.
    • Activated a system for detecting and debugging split locks (β€œsplit lock") that occur when accessing unaligned data in memory due to the fact that when executing an atomic instruction, the data crosses two CPU cache lines. Such locks lead to a significant drop in performance (1000 cycles slower than an atomic operation with data falling into one cache line). Depending on the "split_lock_detect" boot parameter, the kernel can detect such locks on the fly and issue warnings or send a SIGBUS signal to the application that caused the lock.
    • The task scheduler provides tracking of temperature sensors (Thermal Pressure) and implemented accounting for overheating when placing tasks. Using the statistics provided, the thermal governor can correct the maximum CPU frequency in case of overheating, and the task scheduler now takes into account the reduction in processing power due to such a reduction in frequency when scheduling task launches (previously, the scheduler reacted to a change in frequency with a certain delay, for some time making decisions based on inflated assumptions about available computing resources).
    • The task scheduler includes invariant indicators load monitoring, which allows you to correctly assess the load, regardless of the current frequency of the CPU. The change allows you to more accurately predict the behavior of tasks in conditions of dynamic changes in the voltage and frequency of the CPU. For example, a task that consumed 1/3 of the CPU at 1000 MHz would consume 2/3 of the CPU at 500 MHz, which previously created a false assumption that it was running at full capacity (i.e. tasks seemed larger to the scheduler). only by reducing the frequency, which led to the adoption of incorrect decisions in the schedutil cpufreq governor).
    • The Intel P-state driver responsible for selecting performance modes has been switched to use scheduleutil.
    • Implemented the ability to use the BPF subsystem when the kernel is running in real-time mode (PREEMPT_RT). Previously, enabling PREEMPT_RT was instructed to disable BPF.
    • A new type of BPF programs has been added - BPF_MODIFY_RETURN, which can be attached to a function in the core and change the return value of this function.
    • Added by opportunity using the clone3() system call to create a process in a cgroup that is different from the parent cgroup, which allows the parent process to apply restrictions and enable accounting as soon as the new process or thread is spawned. For example, a service manager can directly allocate new services to separate cgroups, and new processes, when placed in "frozen" cgroups, will be immediately stopped.
    • in Kbuild added support for the "LLVM=1" environment variable to switch to the Clang/LLVM toolkit when building the kernel. Raised requirements for binutils version (2.23).
    • Added /sys/kernel/debug/kunit/ section to debugfs with kunit test results.
    • Added kernel boot option pm_debug_messages (similar to /sys/power/pm_debug_messages) to enable output of debugging information about the power management system (useful when debugging problems with hibernation and standby).
    • To the asynchronous I/O interface io_uring added support splice() ΠΈ atomic buffer selection.
    • Improved cgroup profiling with the perf toolkit. Previously, perf could only profile tasks in a particular cgroup and could not find out which cgroup the current sample belongs to. perf now gets information about the cgroup for each sample, which allows you to profile more than one cgroup and apply sorting by
      cgroup in reports.

    • cgroupfs, a pseudo-FS for managing cgroups, has added support for extended attributes (xattrs) that can, for example, leave additional information for user-space handlers.
    • In cgroup memory controller addedand support for recursive protection of the "memory.low" value, which regulates the minimum amount of RAM provided to group members. When mounting a cgroup hierarchy with the "memory_recursiveprot" option, the "memory.low" value that is set for lower nodes will automatically be distributed to all child nodes.
    • Added Uacce framework (Unified/User-space-access-intended Accelerator Framework) for sharing virtual addresses (SVA, Shared Virtual Addressing) between CPU and peripheral devices, allowing hardware accelerators to access data structures in the main CPU.
  • Hardware architectures
    • For the ARM architecture, the possibility of hot extraction of memory is implemented.
    • For the RISC-V architecture, support for hot plugging and removing the CPU (CPU hotplug) has been added. For 32-bit RISC-V, eBPF JIT is implemented.
    • Removed the ability to use 32-bit ARM systems to run KVM guest environments.
    • Removed the "dummy" NUMA implementation for the s390 architecture, for which no use cases have been found to improve performance.
    • Added support for the AMU (Activity Monitors Unit) extension for ARM64, defined in ARMv8.4 and providing performance counters that are used to calculate frequency scaling correction factors in the task scheduler.
  • Equipment
    • Added by support for vDPA devices that use a data link that complies with virtio specifications. vDPA devices can be either physically connected hardware or software emulated virtual devices.
    • In the GPIO subsystem appeared a new ioctl() command for monitoring changes, which allows you to inform the process about a change in the state of any GPIO line. As an example of using the new command proposed gpio-watch utility.
    • In the i915 DRM driver for Intel graphics cards is included default support for Tigerlake ("Gen12") chips and added initial support for OLED backlight control. Improved support for Ice Lake, Elkhart Lake, Baytrail and Haswell chips.
    • In the amdgpu driver added the ability to download firmware to the USBC chip for ASICs. Improved support for AMD Ryzen 4000 "Renoir" chips. Added support for managing OLED panels. Provided display of firmware status in debugfs.
    • Added the ability to use OpenGL 4 in guests to the vmwgfx DRM driver for VMware virtualization systems (previously supported OpenGL 3.3).
    • A new tidss DRM driver has been added for the TI Keystone platform display system.
    • Added drivers for LCD panels: Feixin K101 IM2BA02, Samsung s6e88a0-ams452ef01, Novatek NT35510, Elida KD35T133, EDT, NewEast Optoelectronics WJFH116008A, Rocktech RK101II01D-CT, Frida FRD350H54004.
    • To the power management system added support for the Intel Jasper Lake (JSL) platform based on Atom.
    • Added support for Rockchip RK3399 based Pinebook Pro laptop, Pine64 PineTab tablet and smartphone pinephone based on Allwinner A64.
    • Added support for new audio codecs and chips:
      Amlogic AIU, Amlogic T9015, Texas Instruments TLV320ADCX140, Realtek RT5682, ALC245, Broadcom BCM63XX I2S, Maxim MAX98360A, Presonus Studio 1810c, MOTU MicroBook IIc.

    • Added support for ARM boards and platforms Qualcomm Snapdragon 865 (SM8250), IPQ6018, NXP i.MX8M Plus, Kontron "sl28", 11 i.MX6 TechNexion Pico board options, three new Toradex Colibri options, ST-based Samsung S7710 Galaxy Xcover 2 -Ericsson u8500, DH Electronics DHCOM SoM and PDK2, Renesas M3ULCB, Hoperun HiHope, Linutronix Testbox v2, PocketBook Touch Lux 3.

Source: opennet.ru

Add a comment