Linux 5.9 kernel release

After two months of development Linus Torvalds presented kernel release Linux 5.9. Among the most notable changes: restriction of importing symbols from proprietary modules to GPL modules, speeding up context switching operations with the FSGSBASE processor instruction, support for kernel image compression using Zstd, reworking kernel thread prioritization, support for PRP (Parallel Redundancy Protocol) , throughput based scheduling in deadline scheduler, preemptive packing of memory pages, capability flag CAP_CHECKPOINT_RESTOR, close_range() system call, dm-crypt performance improvement, code removal for 32-bit Xen PV guests, new slab memory management mechanism, option "rescue" in Btrfs, support for inline encryption in ext4 and F2FS.

The new version received 16074 fixes from 2011 developers,
patch size - 62 MB (changes affected 14548 files, 782155 lines of code added, 314792 lines removed). About 45% of all presented in 5.9
changes are associated with device drivers, approximately 15% of changes have
attitude towards updating code specific to hardware architectures, 13%
related to the network stack, 3% to file systems, and 3% to internal
kernel subsystems.

All innovations:

  • Memory and system services
    • Tightened protection against the use of GPL layers to link proprietary drivers with kernel components exported only for GPL-licensed modules. The TAINT_PROPRIETARY_MODULE flag is now inherited in all modules that import symbols from modules with this flag. If a GPL module tries to import symbols from a non-GPL module, then that GPL module will inherit the TAINT_PROPRIETARY_MODULE label and will not be able to access kernel components that are only available to GPL-licensed modules, even if the module has previously imported symbols from the "gplonly" category. A backlock (exporting only EXPORT_SYMBOL_GPL in modules that imported EXPORT_SYMBOL_GPL) that could break proprietary drivers is not implemented (only the proprietary module flag is inherited, not GPL bindings).
    • Added by kcompactd engine support for preemptive packing of memory pages in the background to increase the number of large memory pages available to the kernel. According to preliminary estimates, background packing, at the cost of minimal overhead, can reduce delays in allocating large memory pages (huge-page) by 70-80 times compared to the previously used on-demand packing mechanism. Added sysctl vm.compaction_proactiveness to set the boundaries of external fragmentation that kcompactd will enforce.
    • Added by support for kernel image compression using an algorithm zstandard (zstd).
    • For x86 systems, processor instruction support is implemented FSGSBASE, which allows you to read and modify the contents of the FS / GS registers from user space. In the kernel, FSGSBASE is used to speed up context switching operations by eliminating redundant GSBASE MSR writes, and in user space, it avoids unnecessary system calls to change FS / GS.
    • Added the "allow_writes" parameter, which allows you to prohibit changes to the processor's MSR registers from user space and restrict access to the contents of these registers to read operations, since changing the MSR can lead to problems. By default, writing is not yet prohibited, and the change in MSR is reflected in the log, but in the future it is planned to transfer access by default to read-only mode.
    • To the asynchronous I/O interface io_uring added full support for asynchronous buffered read operations that do not require the involvement of kernel threads. Write support is expected in the next release.
    • In I/O scheduler deadline implemented capacity planning, allowing make correct decisions on asymmetric systems, such as systems based on ARM architectures DynamIQ and big.LITTLE, which combine powerful and less productive energy-efficient CPU cores in one chip. In particular, the new mode avoids scheduling mismatches when a slow CPU core does not have the proper resources to complete a task on time.
    • The energy consumption model in the core (Energy Model framework) is now describes the not only the CPU power consumption behavior, but also covers peripherals.
    • The close_range() system call has been implemented to allow a process to close an entire range of open file descriptors at once.
    • From the implementation of the text console and the fbcon driver removed code, which enables software scrollback of text (CONFIG_VGACON_SOFT_SCROLLBACK) by more than the amount of VGA text-mode video memory.
    • Redesigned an algorithm for assigning priorities to threads within the kernel. The new option provides better consistency across all kernel subsystems when prioritizing real-time tasks.
    • Added sysctl sched_uclamp_util_min_rt_default to control CPU boost settings for real-time tasks (for example, you can change the behavior of real-time tasks on the fly to save power after switching to battery power or on mobile systems).
    • Preparations have been made to implement support for the Transparent Huge Pages technology in the page cache.
    • The fanotify mechanism implements new flags FAN_REPORT_NAME and FAN_REPORT_DIR_FID to report information about the parent name and unique FID when the creation, deletion, or movement of catalog items and non-directory objects occurs.
    • For cgroups implemented the new slab memory controller, which is notable for moving slab accounting from the page level to the kernel object level, which makes it possible to share slab pages across cgroups, instead of having separate slab caches for each cgroup. The proposed approach makes it possible to increase the efficiency of using slab, reduce the size of memory used for slab by 30-45%, significantly reduce the overall memory consumption of the kernel and reduce memory fragmentation.
    • In the audio subsystem ALSA ΠΈ USB stack, in accordance with recently adopted recommendations for the use of inclusive terminology in the Linux kernel, cleaning of non-politically correct terms was carried out. The code is cleared from the use of the words "slave", "master", "blacklist" and "whitelist".
  • Virtualization and Security
    • When building the kernel using the Clang compiler appeared the ability to configure (CONFIG_INIT_STACK_ALL_ZERO) automatic zero-initialization of all variables stored on the stack (during assembly, "-ftrivial-auto-var-init=zero" is specified).
    • To the seccomp subsystem, when using the user-space process control mode, added opportunity substituting file descriptors into the monitored process to fully emulate system calls that result in the creation of file descriptors. The functionality is in demand in isolated container systems and the Chrome sandbox implementation.
    • Added support for restricting system calls using the seccomp subsystem for the xtensa and csky architectures. Support for the audit mechanism has been additionally implemented for xtensa.
    • Added new capability-flag CAP_CHECKPOINT_RESTORE, which allows, without transferring additional privileges, to provide access to the capabilities related to freezing and restoring the state of processes.
    • GCC 11 has all the features you need to
      debugging tool KCSAN (Kernel Concurrency Sanitizer), designed to dynamically detect race conditions inside the kernel. Thus, KCSAN can now be used with kernels built with GCC.

    • For AMD Zen and newer CPU models added support for P2PDMA technology, which allows using DMA for direct data transfer between the memory of two devices connected to the PCI bus.
    • A mode has been added to dm-crypt that allows you to reduce latency by performing cryptographic data processing without using work queues. This mode is also required for correct operation with zoned block devices (devices with areas that must be written sequentially with the update of the entire group of blocks). Work has been done to increase throughput and reduce delays in dm-crypt.
    • Removed code to support 32-bit guests running in paravirtualization mode running the Xen hypervisor. Users of such systems should switch to using 64-bit kernels in guest environments or use full (HVM) or combined (PVH) virtualization modes instead of paravirtualization (PV) to run environments.
  • Disk Subsystem, I/O and File Systems
    • On the btrfs filesystem implemented "rescue" mount option, which unifies access to all other recovery options. Removed support for "alloc_start" and "subvolrootid" options, deprecated "inode_cache" option. Performance optimization has been carried out, the execution of fsync() operations has been especially noticeably accelerated. Added by the ability to use alternative types of checksums other than CRC32c.
    • Added by the ability to use inline encryption (Inline Encryption) in ext4 and F2FS file systems, to enable which the β€œinlinecrypt” mount option is provided. The inline encryption mode allows you to use the encryption mechanisms built into the drive controller that transparently encrypt and decrypt I/O.
    • In XFS secured inode flush (flush) in a fully asynchronous mode that does not block processes when performing a memory cleanup operation. Resolved a long-standing quota issue that incorrectly tracked soft limit and inode limit exceeded warnings. Unified implementation of DAX support for ext4 and xfs.
    • In Ext4 implemented forward loading of block allocation bitmaps. Combined with the limitation of scanning uninitialized groups, the optimization has reduced the time to mount very large partitions.
    • In F2FS added ioctl F2FS_IOC_SEC_TRIM_FILE, which allows using the TRIM/discard commands to physically reset the specified data in the file, for example, to delete access keys without settling on the residual data drive.
      Also in F2FS added new garbage collection mode GC_URGENT_LOW, which works more aggressively by eliminating some checks for being in an idle state (idle) before starting the garbage collector.

    • In bcache, the bucket_size for extents has been increased from 16 bits to 32 bits in preparation for enabling zoned device caches.
    • The ability to use inline encryption based on built-in hardware encryption tools provided by UFS controllers has been added to the SCSI subsystem (Universal Flash Storage).
    • A new kernel command line parameter "debugfs" has been added, which allows you to control the availability of the pseudo-FS of the same name.
    • The NFSv4.2 client provides support for extended file attributes (xattr).
    • In dm-dust added interface for displaying a list of all detected bad blocks on the disk at once (β€œdmsetup message dust1 0 listbadblocks”).
    • For md/raid5, the /sys/block/md1/md/stripe_size parameter has been added to set the size of the STRIPE block.
    • For NVMe storage devices added support for commands for zoning the drive (ZNS, NVM Express Zoned Namespace), which allows you to divide storage space into zones that make up groups of blocks, for more complete control over the placement of data on the drive.
  • Network subsystem
    • In Netfilter added the ability to reject packets at the stage before checking the routing (the REJECT expression can now be used not only in the INPUT, FORWARD and OUTPUT chains, but also at the PREROUTING stage for icmp and tcp).
    • In nftables added the ability to audit events related to configuration changes.
    • In nftables in the netlink API added support for anonymous chains, the name of which is assigned dynamically by the kernel. When you delete a rule associated with an anonymous chain, the chain itself is automatically deleted.
    • Support for iterators has been added to BPF to traverse, filter, and modify the elements of an associative array (map) without copying the data to user space. Iterators can be used on TCP and UDP sockets, allowing BPF programs to iterate through lists of open sockets and extract the necessary information from them.
    • A new type of BPF programs BPF_PROG_TYPE_SK_LOOKUP has been added, which are launched when the kernel is looking for a suitable listening socket for an incoming connection. With a BPF program like this, you can create handlers that make decisions about which socket to associate a connection with, not limited to the bind() system call. For example, you can bind a single socket to a range of addresses or ports. In addition, support for the SO_KEEPALIVE flag has been added to bpf_setsockopt() and the ability to set BPF_CGROUP_INET_SOCK_RELEASE handlers called when a socket is released has been implemented.
    • Implemented protocol support PRP (Parallel Redundancy Protocol), which allows Ethernet-based implementation of application-transparent switching to a spare channel in the event of failure of any network components.
    • To stack mac80211 added Supports 2-way WPA/WPAXNUMX-PSK link negotiation in AP mode.
    • Added the ability to switch the qdisc (queuing discipline) scheduler to use the default FQ-PIE (Flow Queue PIE) network queue management algorithm, aimed at reducing the negative impact of intermediate packet buffering on edge network equipment (bufferbloat) in networks with cable modems.
    • New features have been added to MPTCP (MultiPath TCP), an extension of the TCP protocol for organizing the operation of a TCP connection with the delivery of packets simultaneously along several routes through different network interfaces bound to different IP addresses. Added support for syn cookies, DATA_FIN, buffer autotuning, socket diagnostics, and the use of the REUSEADDR, REUSEPORT, and V6ONLY flags in setsockopt.
    • For virtual routing tables VRF (Virtual Routing and Forwarding), which allow organizing the operation of several routing domains on one system, the "strict" mode is implemented. In this mode, a virtual table can only be associated with a routing table that is not used in other virtual tables.
    • The ath11k wireless driver added 6GHz frequency support and spectral scanning.
  • Equipment
    • Removed code to support the UniCore architecture developed at the Peking University Microprocessor Center and included in the Linux kernel in 2011. This architecture has been unmaintained since 2014 and has no support in GCC.
    • RISC-V architecture supported kcov (debugfs interface for analyzing kernel code coverage), kmemleak (memory leak detection system), stack protection, jump markers and tickless operations (timer-independent multitasking).
    • For the PowerPC architecture, support for spinlock queues has been implemented, which has significantly improved performance in lock conflict situations.
    • For ARM and ARM64 architectures, the processor frequency throttling mechanism is enabled by default scheduleutil (cpufreq governor), which directly uses information from the task scheduler to make a decision on changing the frequency and can immediately access cpufreq drivers to quickly change the frequency, instantly adjusting the CPU operation parameters to the current load.
    • i915 DRM Driver for Intel Graphics Enabled Support for Microarchitecture-Based Chips Rocket lake and added initial support for discrete cards Intel Xe DG1.
    • Added initial AMD GPU support to amdgpu driver Navi 21 (Navy Flounder) and Navi 22 (Sienna Cichlid). Added support for UVD/VCE acceleration engines for GPU Southern Islands (Radeon HD 7000).
      Added a property to rotate the display by 90, 180 or 270 degrees.

      Interestingly, the AMD GPU driver is the largest driver in the kernel - it has about 2.71 million lines of code, which is about 10% of the total size of the kernel (27.81 million lines). At the same time, 1.79 million lines are accounted for by automatically generated header files with data for GPU registers, and the C code is 366 thousand lines (for comparison, the Intel i915 driver includes 209 thousand lines, and Nouveau - 149 thousand lines).

    • To driver Nouveau added support frame-by-frame integrity check using CRC (Cyclic Redundancy Checks) in NVIDIA GPU display engines. Implementation based on documentation provided by NVIDIA.
    • Added drivers for LCD panels: Frida FRD350H54004, KOE TX26D202VM0BWA, CDTech S070PWS19HP-FC21, CDTech S070SWV29HG-DC44, Tianma TM070JVHG33 and Xingbangda XBD599.
    • The ALSA audio subsystem supports Intel Silent Stream (continuous power supply mode for external HDMI devices to eliminate delay when starting playback) and new device to control the illumination of the microphone activation and mute buttons, and added support for new hardware, including a controller Longson 7A1000.
    • Added support for ARM boards, devices and platforms: Pine64 PinePhone v1.2, Lenovo IdeaPad Duet 10.1, ASUS Google Nexus 7, Acer Iconia Tab A500, Qualcomm Snapdragon SDM630 (used in Sony Xperia 10, 10 Plus, XA2, XA2 Plus and XA2 Ultra), Jetson Xavier NX, Amlogic WeTek Core2, Aspeed EthanolX, five new boards based on NXP i.MX6, MikroTik RouterBoard 3011, Xiaomi Libra, Microsoft Lumia 950, Sony Xperia Z5, MStar, Microchip Sparx5, Intel Keem Bay, Amazon Alpine v3, Renesas RZ/G2H.

Simultaneously, the Latin American Free Software Foundation formed
option completely free kernel 5.9 β€” linux-libre 5.9-gnu, cleared of firmware and driver elements containing non-free components or code sections, the scope of which is limited by the manufacturer. The new release disables blobs loading in drivers for WiFi rtw8821c and SoC MediaTek mt8183. Updated blob cleanup code in Habanalabs, Wilc1000, amdgpu, mt7615, i915 CSR, Mellanox mlxsw (Spectrum3), r8169 (rtl8125b-2) and x86 touchscreen drivers and subsystems.

Source: opennet.ru

Add a comment