Linux 5.1 kernel release

After two months of development Linus Torvalds presented kernel release Linux 5.1. Among the most notable changes: a new interface for asynchronous io_uring I / O, the ability to use NVDIMM as RAM, support for shared virtual memory in Nouveau, support for scalable monitoring of very large file systems via fanotify, the ability to adjust Zstd compression levels in Btrfs, the new cpuidle TEO handler, implementation of system calls to solve the problem of 2038, the ability to boot from device-mappers without initramfs, SafeSetID LSM module, support for combined live patches.

All innovations:

  • Disk Subsystem, I/O and File Systems
    • Implemented a new interface for asynchronous I/O β€” io_uring, which is notable for its support for I / O polling and the ability to work both with buffering and without buffering. Recall that the previously proposed asynchronous I / O mechanism β€œaio” did not support buffered I / O, could only work in O_DIRECT mode (without buffering and bypassing the cache), had problems with the occurrence of locks due to waiting for the availability of metadata and showed large overheads costs due to copying data in memory.

      Within the API
      io_uring developers tried to eliminate the shortcomings of the old aio interface. By productivity io_uring is very close to SPDK and significantly outperforms libaio when polling is enabled. A library has been prepared for using io_uring in user-space end applications liburing, which provides a high-level binding over the kernel interface;

    • To the FS event tracking mechanism fanotify() added support for tracking superblock and structure change situations dirent (events of creating, deleting and moving directories). The presented features help to solve the scalability problems that arise when creating recursive change tracking in very large file systems using the inotify mechanism (dirent changes could previously only be tracked through inotify, but
      performance under conditions of recursive tracking of large nested directories left much to be desired). Now such monitoring can be effectively done through fanotify;

    • On the btrfs file system added the ability to adjust the compression level for the zstd algorithm, which can be considered as an optimal compromise between the fast but inefficient lz4 and the slow but good compression xz. Similar to how previously it was possible to set the compression level when using zlib for zstd, support for the "-o compress=zstd:level" mount option has been added. When tested, the minimum first level provided data compression by 2.658 times at a compression rate of 438.47 MB ​​/ s, decompression speed of 910.51 MB / s and memory consumption of 780 MB, and the maximum level 15 - by 3.126 times, but at a compression rate of 37.30 MB / s, unpacking 878.84 MB/s and memory consumption 2547 MB;
    • Added by the ability to boot from the file system hosted on the device-mapper without using an initramfs. Since the current release of the device kernel, device-mapper can be directly used during the boot process, for example, as a partition with a root FS. The partition is configured using the boot parameter "dm-mod.create". Device-mapper modules allowed for loading include "crypt", "delay", "linear", "snapshot-origin" and "verity";
    • The F2FS_NOCOW_FL flag has been added to the Flash-oriented F2FS file system, which allows you to disable the copy-on-write mode for a given file;
    • File system removed from kernel Exofs, which is a variant of ext2 adapted to work with OSD (Object-based Storage Device) object stores. Also removed support for the SCSI protocol for such object storage devices;
  • Virtualization and Security
    • Added option PR_SPEC_DISABLE_NOEXEC to prctl() to control speculative execution of instructions for the selected process. A new option allows you to selectively disable speculative execution for processes that could potentially be attacked by a Specter-type attack. The lock is valid until the first call to exec();
    • Implemented LSM module SafeSet ID, which allows system services to securely manage users without privilege escalation (CAP_SETUID) and without gaining root user authority. The assignment of privileges is done by defining rules in securityfs based on a white list of valid bindings (in the form "UID1:UID2");
    • Added low-level changes required for stacking the loading of security modules (LSMs). Introduced "lsm" kernel boot option to control which modules are loaded and in what order;
    • Support for file namespaces has been added to the audit subsystem;
    • Expanded the capabilities of the structleak GCC plugin, which allows you to block potential memory leaks. Any variables that are used in the code are initialized by reference on the stack;
  • Network subsystem
    • For sockets implemented new option "SO_BINDTOIFINDEX" similar to
      "SO_BINDTODEVICE", but taking as an argument the index number of the network interface instead of the interface name;

    • Added the ability to assign multiple BSSIDs (MAC addresses) to a single device in the mac80211 stack. As part of the WiFi performance optimization project, the mac80211 stack has added airtime accounting and the ability to distribute airtime between several stations (when operating in access point mode, allocating less time to transmit to slow wireless stations, instead of evenly distributing time between all stations);
    • Added mechanism "devlink health", which provides notifications when there are problems with the network interface;
  • Memory and system services
    • Implemented secure signal delivery considering PID reuse. For example, when performing a kill call, a situation could previously arise when, immediately after sending a signal, the target PID could be released due to process termination and is occupied by another process, and as a result, the signal was transferred to another process. To avoid such situations, a new pidfd_send_signal system call has been added, which uses file descriptors from /proc/pid to provide a stable process affinity. Even if the PID is re-enabled during the processing of a system call, the file descriptor will not change and can be safely used to send a signal to a process;
    • Added by the ability to use persistent memory devices (persistent-memory, for example NVDIMMs) as RAM. Until now, the kernel has supported such devices as storage devices, but now they can also be used as additional RAM. The feature was implemented in response to the wishes of users who are ready to put up with a performance lag and who want to use the standard Linux kernel memory management API instead of using existing user-space memory allocation systems that work on top of mmap for a dax file;
    • A new CPU idle handler has been added (cpuidle, decides when it is possible to switch the CPU to deep power saving modes, the deeper the mode, the greater the savings, but the longer it takes to exit the mode) - TEO (Timer Events Oriented Governor). So far, two cpuidle handlers have been proposed - "menu" and "ladder", which differ in heuristics. The "menu" handler has known problems with heuristic decision making, and it was decided to prepare a new handler to fix it. TEO is positioned as an alternative to the "menu" handler, allowing for better performance while maintaining the same power consumption.
      You can activate a new handler using the boot parameter "cpuidle.governor=teo";

    • As part of the work to eliminate problems of 2038, caused by an overflow of the 32-bit time_t type, system calls are included that offer 32-bit time counters for 64-bit architectures. As a result, the 64-bit time_t structure can now be used on all architectures. Similar changes are also implemented in the network subsystem for options timestamp network sockets;
    • To the system of hot patching on the core (live patching) added "Atomic Replace" capability for atomically applying a series of changes to a single function. This feature allows you to distribute consolidated patches that cover several changes at once, instead of the process of step-by-step application of live patches in a strictly defined order that is difficult enough to accompany. Whereas previously each subsequent change had to build on the state of the function after the last change, it is now possible to propagate multiple changes tied to the same initial state at once (i.e. maintainers can maintain one consolidated patch relative to the base kernel instead of a chain of patches that depend on each other );
    • Announced deprecated support for the a.out executable file format and
      deleted code for generating core files in the a.out format, which is in an abandoned state. The a.out format is no longer used on Linux systems, and the generation of a.out files is no longer supported by modern tools in default Linux configurations. Also, the loader for a.out files can be implemented entirely in user space;

    • The ability to identify and remove unused code has been added to the BPF program verification mechanism. The kernel also includes patches with spinlock support for the BPF subsystem, which provide additional features for managing the parallel execution of BPF programs;
  • Equipment
    • To driver Nouveau added support for heterogeneous memory management, which allows the CPU and GPU to access common synchronized memory areas. The system of shared virtual memory (SVM, shared virtual memory) is implemented on the basis of the HMM (Heterogeneous memory management) subsystem, which allows the use of devices with their own memory management units (MMU, memory management unit), which can access the main memory. Including using HMM, you can organize a shared address space between the GPU and the CPU, in which the GPU can access the main memory of the process. SVM support is currently only enabled for Pascal family GPUs, although support is also provided for Volta and Turing GPUs. Also in Nouveau added new ioctl to control the migration of process memory regions to GPU memory;
    • In Intel DRM driver for GPU Skylake and newer (gen9+) included the default mode is fastboot, which eliminates unnecessary mode changes during boot. Added New device identifiers based on Coffelake and Ice Lake microarchitectures. For Coffelake chips added GVT support (GPU virtualization). For virtual GPUs implemented VFIO EDID support. For MIPI/DSI LCD panels added support for ACPI/PMIC elements. Implemented new TV modes 1080p30/50/60 TV;
    • Support for Vega10/20 BACO GPUs has been added to the amdgpu driver. Implemented Vega 10/20 power management tools and Vega 10 cooler management tables. Added new PCI device IDs for Picasso GPUs. Added scheduling dependency management interface to avoid deadlocks;
    • Added DRM/KMS Driver for Screen Accelerators ARM Komeda (Mali D71);
    • Added support for Toppoly TPG110, Sitronix ST7701, PDA 91-00156-A0, LeMaker BL035-RGB-002 3.5 and Kingdisplay kd097d04 screen panels;
    • Added support for Rockchip RK3328, Cirrus Logic CS4341 and CS35L36, MediaTek MT6358, Qualcomm WCD9335 and Ingenic JZ4725B audio codecs, as well as Mediatek MT8183 audio platform;
    • Added support for Flash NAND controllers STMicroelectronics FMC2, Amlogic Meson;
    • Added support for accelerators for Habana AI engine systems;
    • Added support for NXP ENETC gigabit Ethernet controllers and MediaTek MT7603E (PCIe) and MT76x8 wireless interfaces.

Simultaneously, Free Software Foundation Latin America formed
option completely free kernel 5.1 β€” linux-libre 5.1-gnu, cleaned from elements of firmware and drivers containing non-free components or code sections, the scope of which is limited by the manufacturer. The new release disables blob loading in mt7603 and goya drivers. Updated blob cleanup code in wilc1000, iwlwifi, soc-acpi-intel, brcmfmac, mwifiex, btmrvl, btmtk and touchscreen_dmi drivers and subsystems. Stopped cleaning blobs in the lantiq xrx200 firmware loader due to its removal from the kernel.

Source: opennet.ru

Add a comment