WD SMR disks are incompatible with ZFS, which can lead to data loss

iXsystems, the developer of the FreeNAS project, warned about serious ZFS compatibility issues with some of the new WD Red hard drives manufactured by Western Digital using SMR technology (Shingled Magnetic Recording, tiled magnetic recording). In the worst case scenario, using ZFS on problem drives can result in data loss.

Problems arise with WD Red drives with capacities ranging from 2 to 6 TB, produced since 2018, which use technology for recording DM-SMR (Device-Managed Shingled Magnetic Recording) and are marked EFAX label (for CMR disks the EFRX identifier is used). Western Digital noted the in his blog that WD Red SMR drives are designed for use in NAS for home and small businesses, which install no more than 8 drives and have a load of 180 TB per year, typical for backup and file sharing. The previous generation of WD Red drives and WD Red models with a capacity of 8 TB or more, as well as drives from the WD Red Pro, WD Gold and WD Ultrastar lines, continue to be manufactured based on CMR (Conventional Magnetic Recording) technology and their use does not cause problems with ZFS.

The essence of SMR technology is the use of a magnetic head on a disk, the width of which is greater than the width of the track, which leads to recording with partial overlap of the adjacent track, i.e. any re-recording results in the need to re-record the entire group of tracks. To optimize work with such drives, it is used zoning β€” storage space is divided into zones that make up groups of blocks or sectors, into which only sequential addition of data is allowed with updating the entire group of blocks. In general, SMR drives are more energy efficient, more affordable, and show performance benefits for sequential writes, but lag when performing random writes, including operations such as rebuilding storage arrays.

DM-SMR implies that zoning and data distribution operations are controlled by the disk controller and for the system such a disk looks like a classic hard disk that does not require separate manipulations. DM-SMR uses indirect logical block addressing (LBA, Logical Block Addressing), reminiscent of logical addressing in SSD drives. Each random write operation requires a background garbage collection operation, resulting in unpredictable performance fluctuations. The system may try to apply optimizations to such disks, believing that the data will be written to the specified sector, but in fact the information issued by the controller determines only the logical structure and in fact, when distributing data, the controller will apply its own algorithms that take into account previously allocated data. Therefore, before using DM-SMR disks in a ZFS pool, it is recommended to perform an operation to zero them and reset them to their original state.

Western Digital has been involved in analyzing the conditions under which problems arise, which, together with iXsystems, is trying to find a solution and prepare a firmware update. Before publishing conclusions about fixing the problems, drives with the new firmware are planned to be tested on high-load storages with FreeNAS 11.3 and TrueNAS CORE 12.0. It is stated that due to different interpretations of SMR by different manufacturers, some types of SMR drives do not have problems with ZFS, but the testing undertaken by iXsystems is focused only on checking WD Red drives based on DM-SMR technology, and for SMR drives other manufacturers additional research is required.

Currently, ZFS issues have been proven and repeated in tests for at least WD Red 4TB WD40EFAX drives with firmware 82.00A82 and manifest transition to a failure state under high write load, for example, when performing a storage rebuild after adding a new drive to the array (resilvering). It is believed that the problem occurs on other WD Red models with the same firmware. When a problem occurs, the disk begins to return an IDNF (Sector ID Not Found) error code and becomes unusable, which is treated in ZFS as a disk failure and can lead to the loss of data stored on the disk. If multiple disks fail, data in a vdev or pool may be lost. It is noted that the mentioned failures occur quite rarely - out of about a thousand FreeNAS Mini systems sold that were equipped with problematic disks, the problem surfaced in working conditions only once.

Source: opennet.ru

Add a comment