A little about SMART and monitoring utilities

There is a lot of information on the web about SMART and attribute values. But I have not seen mention of several important points that I know about from people involved in the study of storage media.

When I once again told a friend about why SMART testimony should not be unconditionally trusted and why it is better not to use the classic β€œSMART monitors” all the time, the idea came to me to write down the words spoken in the form of a set of theses with explanations. To give links, instead of retelling each time. And for a wide audience.

1) Programs for automatic monitoring of SMART attributes should be used with great care.

What you know as SMART attributes are not stored out of the box, but generated the moment you request them. They are calculated on the basis of internal statistics accumulated and used by the drive's firmware during operation.

Part of this data is not needed by the device to provide the main functionality. And it is not stored, but formed every time it is required. Therefore, when a SMART attribute request occurs, the firmware starts a large number of processes that are needed to obtain the missing data.

But these processes are poorly compatible with the procedures performed when the drive is loaded with read-write operations.

In an ideal world, this shouldn't cause any problems. But in reality, hard drive firmware is written by ordinary people. Who can and do make mistakes. Therefore, if you request SMART attributes while the device is actively performing read-write operations, then the likelihood that something will go wrong increases dramatically. For example, the data in the user's read or write buffer will be corrupted.

The statement about the increase in risks is not a theoretical conclusion, but a practical observation. For example, there is a known bug that took place in the Samsung 103UI HDD firmware, where user data was damaged during the execution of a SMART attribute request.

Therefore, do not configure automatic SMART attribute checking. Unless you know for sure that the cache flush command (Flush Cache) is given before this. Or, if you cannot do without it, configure the check to be performed as rarely as possible. In many monitoring programs, the default time between checks is on the order of 10 minutes. It's too often. All the same, such checks are not a panacea for an unexpected disk failure (a panacea is only redundancy). Once a day is enough in my opinion.

A temperature request does not trigger attribute calculation processes and can be performed frequently. Because when properly implemented, this is done through the SCT protocol. Only what is already known is given through the SCT. This data is updated automatically in the background.

2) SMART attribute data is often unreliable.

The hard disk firmware shows you what it wants to show you, not what is actually happening. The most obvious example is the 5th attribute, the number of reassigned sectors. Data recovery specialists are well aware that a hard drive can show zero reallocations in the fifth attribute, even though they exist and continue to appear.

I asked a specialist who studies hard drives and examines their firmware. I asked what is the principle by which the firmware of the device decides that right now it is necessary to hide the fact of sector reassignment, and now you can talk about it through the SMART attributes.

He replied that there is no general rule according to which devices show or hide the real picture. And the logic of programmers who write firmware for hard drives looks very strange at times. Studying the firmware of different models, he saw that often the decision to β€œhide or show” is made on the basis of a set of parameters that are generally not clear how they are related to each other and to the remaining hard drive resource.

3) The interpretation of SMART scores is vendor-specific.

For example, on Seagates, you should not pay attention to the "bad" raw values ​​of attributes 1 and 7, while the rest are normal. On drives from this manufacturer, their absolute values ​​may increase during normal use.

A little about SMART and monitoring utilities

To assess the condition and residual resource of a hard disk, it is first of all recommended to pay attention to parameters 5, 196, 197, 198. Moreover, it makes sense to focus on absolute, raw values ​​(raw), and not on the given ones. Attribute casting can be performed in non-obvious ways, different in different algorithms and firmware.

In general, in the environment of specialists in information carriers, when they talk about the value of an attribute, it is usually the absolute value that is meant.

Source: habr.com

Add a comment