Linux 5.3 kernel release

After two months of development Linus Torvalds presented kernel release Linux 5.3. Notable changes include: support for AMD Navi GPUs, Zhaoxi processors and Intel Speed ​​Select power management technology, the ability to use umwait instructions to wait without using loops,
interactivity 'utilization clamping' mode for asymmetric CPUs, pidfd_open system call, ability to use IPv4 addresses from the 0.0.0.0/8 subnet, nftables hardware acceleration capability, HDR support in the DRM subsystem, ACRN hypervisor integration.

В announcement With the new release, Linus reminded all developers of the main rule of kernel development - keeping the behavior of user-space components unchanged. Changes to the kernel should not in any way break existing applications and lead to regressions at the user level. At the same time, a violation of behavior can cause not only an ABI change, the removal of obsolete code, or the appearance of errors, but also the indirect impact of correctly working useful improvements. As an illustrative example, there was discarded useful optimization in Ext4 code, which reduces the number of disk accesses by disabling prefetching of the inode table for small I/O requests.

The optimization resulted in the fact that, due to reduced disk activity, the entropy for the getrandom() random number generator began to accumulate more slowly, and in some configurations, under certain circumstances, there could be freezes during loading until the entropy pool was full. Since the optimization is really useful, there was a discussion among the developers in which it was proposed to fix the problem by disabling the default blocking behavior of the getrandom() call with the addition of an optional flag for waiting for entropy, but such a change would affect the quality of random numbers at the initial stage of loading.

The new version accepted 15794 fixes from 1974 developers,
patch size - 92 MB (changes affected 13986 files, 258419 lines of code added,
599137 lines removed). About 39% of all presented in 5.3
changes are associated with device drivers, approximately 12% of changes have
attitude towards updating code specific to hardware architectures, 11%
related to the network stack, 3% to file systems and 3% to internal
kernel subsystems.

All innovations:

  • Memory and system services
    • Continued development of 'pidfd' functionality to help handle PID reuse situations (pidfd is bound to a particular process and does not change, while a PID can be bound to another process after the current process associated with that PID terminates). Previously, the kernel has already been added
      the pidfd_send_signal() system call and the CLONE_PIDFD flag in the clone() call to get the pidfd to use in idfd_send_signal(). When using the clone() call with the CLONE_PIDFD flag, there could be issues with service managers or the system for forcibly terminating processes on low memory on the Android platform. In this case, a fork() or clone() call without CLONE_PIDFD is used to start.

      The 5.3 kernel introduces the system call pidfd_open(), which allows you to get a checkable pidfd for an arbitrary existing process not created via a call to clone() with the CLONE_PIDFD flag. We also added support for polling pidfd with poll() and epoll(), which allows process managers to track the termination of arbitrary processes without worrying about a race condition if a PID is assigned to a new process. The mechanism for notifying the exit of a process associated with pidfd is similar to informing the exit of its child process;

    • Added support for the load pinning mechanism to the task scheduler (Utilization clamping), which allows you to adhere to the minimum or maximum frequency ranges, depending on the tasks active on the CPU. The presented mechanism speeds up tasks that directly affect the quality of user interaction by running these tasks at least at the lower bound of the “requested” frequency. Low-priority tasks that do not affect the user's work are launched using the upper limit of the "allowed" frequency. Limits are set via the sched_uclamp_util_min and sched_uclamp_util_max attributes in the sched_setattr() system call.
    • Added support for energy management technology Intel Speed ​​Select, available on some servers with Intel Xeon processors. This technology allows you to set performance settings and partition bandwidth for different CPU cores, which allows you to prioritize performance for tasks performed on certain cores, sacrificing performance on other cores;
    • Processes in user space provided by the ability to wait for a short time without using loops using the umwait instruction. This instruction, along with the umonitor and tpause instructions, will be offered in the forthcoming Intel "Tremont" chips, and will allow for delays that are energy efficient and do not affect the performance of other threads when using Hyper Threading;
    • Support for large memory pages (huge pages) has been added for the RISC-V architecture;
    • The ability to dereference kernel pointers to user space has been added to the "kprobes" tracing mechanism, which can be used, for example, to evaluate the contents of structures passed to system calls. Also added the ability to set checks at the boot stage.
    • The PREEMPT_RT option has been added to the configuration file for real-time operation. The real-time code itself has not yet been added to the kernel, but the appearance of the option is a good sign that the long-term saga of Integration patches Realtime-Preempt is nearing completion;
    • The clone3() system call has been added with the implementation of a more extensible version of the clone() interface, which allows specifying a larger number of flags;
    • Added bpf_send_signal() handler to allow BPF programs to send signals to arbitrary processes;
    • For perf events in the KVM hypervisor environment, a new event filtering mechanism has been added to allow the administrator to determine the types of events that are allowed or not allowed for monitoring on the guest side;
    • Added the ability to process programs with cycles to the eBPF application verification mechanism if the execution of the cycle is limited and cannot lead to exceeding the limit on the maximum number of instructions;
  • Disk Subsystem, I/O and File Systems
    • For the XFS file system, the possibility of multi-threaded inode traversal is implemented (for example, when checking quotas). Added new ioctl BULKSTAT and INUMBERS, providing access to the features introduced in the fifth edition of the FS format, such as the inode birth time and the ability to set BULKSTAT and INUMBERS parameters for each AG group (Allocation Groups);
    • In Ext4 added support voids in directories (unattached blocks).
      Processing provided flag "i" (immutable) for open files (prohibition of writing in a situation if the flag was set at the moment when the file was already open);

    • Btrfs provides a definition for a fast implementation of crc32c on all architectures;
    • In CIFS, code to support smbdirect has been deprecated. SMB3 adds the ability to use cryptographic algorithms in GCM mode. A new mount option has been added to extract mode parameters from ACEs (Access Control Entry). Optimized open() call performance;
    • An option has been added to F2FS to restrict the garbage collector when running in checkpoint=disable mode. Added ioctl to remove block ranges from F2FS, allowing for on-the-fly partition resizing. Added the ability to place a paging file in F2FS with direct I / O. Added support for pinning a file and allocating blocks for similar files for all users;
    • Added support for asynchronous sendmsg() and recvmsg() operations to the io_uring asynchronous I/O interface;
    • Support for compression using the zstd algorithm and the ability to verify signed FS images have been added to the UBIFS file system;
    • Support for SELinux security labels for files has been added to the Ceph FS;
    • For NFSv4, a new mount option "nconnect=" is implemented, which determines the number of connections established with the server. Traffic between these connections will be distributed using load balancing. In addition, the /proc/fs/nfsd/clients directory is now created by the NFSv4 server with information about current clients, including information about the files they have opened;
  • Virtualization and Security
    • The kernel includes a hypervisor for embedded devices ACRN, which is written with real-time readiness and mission-critical usability in mind. ACRN provides minimal overhead, guarantees low latency and adequate responsiveness when interacting with equipment. Virtualization of CPU resources, I / O, network subsystem, graphics and sound operations is supported. ACRN can be used to run multiple isolated virtual machines in electronic control units, dashboards, automotive information systems, consumer IoT devices and other embedded equipment;
    • In user-mode Linux added a time travel mode that allows you to slow down or speed up time in the UML virtual environment to make it easier to debug time-related code. Also added parameter
      time-travel-start, which allows the system clock to start from the specified time in epoch format;

    • Added new kernel command line options "init_on_alloc" and "init_on_free", which, if specified, enable zeroing of allocated and freed memory areas (filling with zeros during malloc and free), which allows to increase security due to additional initialization overhead;
    • Added new driver virtio-iommu with the implementation of a paravirtualized device that allows sending IOMMU requests, such as ATTACH, DETACH, MAP and UNMAP, over the virtio transport without emulating memory page tables;
    • Added new driver virtio-pmemA representing access to storage devices mapped to the physical address space, such as NVDIMMs;
    • Implemented the ability to attach cryptographic keys to a user or network namespace (the keys become inaccessible outside the selected namespace), as well as protect keys using ACLs;
    • To the crypto subsystem added support for a very fast non-cryptographic hashing algorithm xxhash, the speed of which rests on memory performance;
  • Network subsystem
    • Supported handling of IPv4 addresses in the 0.0.0.0/8 range, which was previously unusable. Introduction of this subnet will allocate another 16 million IPv4 addresses;
    • In Netfilter for nftables added support for hardware-accelerated packet filtering mechanisms through the use of what was added to the drivers Flow Block API. Entire tables of rules with all chains can be placed on the side of network adapters. Enabling is done by binding the NFT_TABLE_F_HW flag to the table. Supports simple layer 3 and 4 protocol metadata, accept/reject actions, mappings by source/destination IP and network ports, and protocol type;
    • Added by built-in support for connection tracking for network bridges, which does not require the use of the br_netfilter emulation layer;
    • In nf_tables added support for the SYNPROXY module, which repeats similar functionality from iptables, as well as the ability to check in the rules for individual options in the IPv4 header;
    • Added the ability to attach BPF programs to the setsockopt() and getsockopt() system calls, which, for example, allows you to attach custom access handlers to these calls. In addition, a new call point (hook) has been added, with the help of which it is possible to organize a call to the BPF program once for each RTT interval (round-trip-time, ping time);
    • For IPv4 and IPv6 added a new mechanism for storing routing data nexthop, aimed at increasing the scalability of routing tables. The tests performed showed that using the new system, a set of 743 thousand routes was loaded into the kernel in just 4.3 seconds;
    • For Bluetooth implemented functionality required to support LE ping;
  • Equipment
    • Added by support for x86-compatible company processors zhaoxindeveloped as a result of a joint project between VIA Technologies and the Shanghai Municipality. The CPU ZX family is based on the x86-64 Isaiah architecture, continuing the development of technology VIA Centaur;
    • DRM (Direct Rendering Manager) subsystem, as well as amdgpu and i915 graphics drivers, added support for parsing, processing and sending HDR (high dynamic range) metadata via HDMI port, allowing the use of HDR panels and screens capable of displaying additional ranges of brightness ;
    • Amdgpu driver adds initial support for AMD NAVI GPU (RX5700), which includes core driver, screen interaction code (DCN2), GFX and compute support (GFX10),
      SDMA 5 (System DMA0), power management and multimedia encoders/decoders (VCN2). amdgpu also improves support for Vega12 and Vega20 based GPU cards with more memory and power management options;

    • Amdkfd driver (for discrete GPUs like Fiji, Tonga, Polaris) added support for cards based on VegaM GPUs;
    • In the DRM driver for Intel video cards for Icelake chips implemented new multi-segment gamma correction mode. Added the ability to output via DisplayPort in YCbCr4:2:0 format. Added new firmware GuC for SKL, BXT, KBL, GLK and ICL. Implemented the ability to turn off the screen power in asynchronous mode. Added by support for saving and restoring the rendering context for Ironlake (gen5) and gen4 (Broadwater - Cantiga) chips, which allows restoring the GPU state from user space when switching from one batch operation to another;
    • Nouveau driver provides NVIDIA Turing TU116 chipset detection;
    • Expanded capabilities of DRM/KMS driver for ARM Komeda (Mali D71) screen operations accelerators, added support for scaling, splitting/merging layers, rotation, lazy writing, AFBC, SMMU, and Y0L2, P010, YUV420_8/10BIT color encoding formats;
    • Added support for Qualcomm's A540 GPU Adreno series to the MSM driver, as well as support for the MSM8998 DSI controller for Snapdragon 835;
    • Added drivers for LCD panels Samsung S6E63M0, Armadeus ST0700, EDT ETM0430G0DH6, OSD101T2045-53TS,
      Evervision VGG804821, FriendlyELEC HD702E, KOE tx14d24vm1bpa, TFC S9700RTWV43TR-01B, EDT ET035012DM6 and VXT VL050-8048NT-C01;

    • Added driver for enabling decoding acceleration tools
      videos available in SoC Amlogic Meson;

    • The v3d driver (for the Broadcom Video Core V GPU used in the Raspberry Pi) introduced support dispatching compute shaders;
    • Added driver for SPI keyboards and trackpads used in modern models of Apple MacBook and MacBook Pro laptops;
    • Added by additional protection for ioctl calls associated with the floppy driver, and the driver itself is marked as left unmaintained
      ("orphaned"), which implies the termination of its testing. The driver is still stored in the kernel, but its correct operation is not guaranteed. The driver is considered obsolete because it is difficult to find working hardware to test it - all current external drives, as a rule, use the USB interface.

    • Added cpufreq driver for Raspberry Pi boards, which allows you to dynamically control the change in processor frequency;
    • Added support for new ARM SoC Mediatek mt8183 (4x Cortex-A73 + 4x Cortex-A53), TI J721E (2x Cortex-A72 + 3x Cortex-R5F + 3 DSPs + MMA) and Amlogic G12B (4x Cortex-A73 + 2x Cortex-A53 ), as well as boards:
      • Purism Librem5,
      • speed bmc,
      • Microsoft Olympus BMC,
      • Kontron SMARC,
      • Novtech Meerkat96 (i.MX7),
      • ST Micro Avenger96,
      • Google Cheza (Qualcomm SDM845),
      • Qualcomm Dragonboard 845c (Qualcomm SDM845),
      • Hugsun X99 TV Box (Rockchip RK3399),
      • Khadas Edge/Edge-V/Captain (Rockchip RK3399),
      • HiHope RZ/G2M,
      • NXP LS1021A-TSN.

Simultaneously, Free Software Foundation Latin America formed
option completely free kernel 5.3linux-libre 5.3-gnu, cleaned from elements of firmware and drivers containing non-free components or code sections, the scope of which is limited by the manufacturer. The new release disables blob loading in the qcom, hdcp drm, allegro-dvt and meson-vdec drivers.
Updated blob cleanup code in drivers and subsystems amdgpu, i915, netx, r8169, brcmfmac, rtl8188eu, adreno, si2157, pvrusb2, touchscreen_dmi, skylake sound driver, and microcode documentation.

Source: opennet.ru

Add a comment