Promoting Bcachefs to the Linux Kernel

Kent Overstreet, the author of the BCache SSD block device caching system, which is part of the Linux kernel, summarized the results of the work on promoting the Bcachefs file system in his speech at the LSFMM 2023 conference (Linux Storage, Filesystem, Memory Management & BPF Summit). into the main composition of the Linux kernel and spoke about plans for the further development of this FS. In May, an updated set of patches with the implementation of the Bcachefs FS was proposed for review and inclusion in the main composition of the Linux kernel. FS Bcachefs has been developing for about 10 years. The readiness for reviewing the implementation of Bcachefs before inclusion in the core was announced at the end of 2020, and the current version of the patches takes into account the comments and shortcomings identified during the previous review.

The Bcachefs development goal is to reach the XFS level in performance, reliability and scalability, while providing additional features inherent in Btrfs and ZFS, such as including multiple devices in a partition, multi-layer storage layouts, replication (RAID 1/10), caching, transparent data compression (LZ4, gzip and ZSTD modes), state slices (snapshots), integrity verification by checksums, the ability to store Reed-Solomon error correction codes (RAID 5/6), encrypted storage of information (ChaCha20 and Poly1305 are used). In terms of performance, Bcachefs is ahead of Btrfs and other file systems based on the Copy-on-Write mechanism, and demonstrates performance close to Ext4 and XFS.

Of the latest achievements in the development of Bcachefs, the stabilization of the implementation of snapshots available for writing is noted. Compared to Btrfs, snapshots in Bcachefs are now much better scalable and free from the problems inherent in Btrfs. In practice, the work of snapshots was tested when organizing MySQL backups. Bcachefs has also done a lot of work to improve scalability - the file system has performed well in testing in 100 TB storage, and Bcachefs is expected to be implemented in 1 PB storage in the near future. A new nocow mode has been added to disable the "copy-on-write" (nocow) mechanism. During the summer, they plan to bring the implementation of error correction codes and RAIDZ to a stable state, as well as solve problems with high memory consumption when restoring and checking file systems with the fsck utility.

Of the plans for the future, the desire to use the Rust language in the development of Bcachefs is mentioned. According to the author of Bcachefs, he likes to code, not to debug code, and now it's crazy to write code in C when there is a better option. Rust is already involved in Bcachefs in the implementation of some of the user-space utilities. Moreover, the idea is being hatched to gradually completely rewrite Bcachefs in Rust, since using this language significantly saves debugging time.

As for moving Bcachefs into the mainstream of the Linux kernel, the adoption process may be delayed due to the large size of the changes (2500 patches and about 90 thousand lines of code), which is difficult to review. To speed up review, some developers have suggested breaking up the patch series into smaller and more logically separated parts. During the discussion, some participants also drew attention to the development of the project by one developer and the danger that the code could be left unmaintained if something happened to its developer (two Red Hat employees are interested in the project, but their work is still limited bug fixes).

Bcachefs is developed using technologies already tested in the development of the Bcache block device, designed to cache access to slow hard drives on fast SSD drives (included in the kernel since release 3.10). Bcachefs uses the Copy-on-Write (COW) mechanism, in which changes do not lead to overwriting data - the new state is written to a new location, after which the current state indicator changes.

A feature of Bcachefs is support for multi-layer connection of drives, in which storage is composed of several layers - the fastest drives (SSD) are connected to the bottom layer, which are used to cache frequently used data, and the top layer forms more capacious and cheaper disks that store less demanded data . Writeback caching can be used between layers. Drives can be dynamically added to and detached from a partition without interrupting the use of the file system (data migrates automatically).

Source: opennet.ru

Add a comment