SSD Health Monitoring in Qsan Arrays

The use of solid-state drives in the field of data storage is no surprise. SSDs have firmly entered the everyday life of IT equipment from personal computers and laptops to servers and data storage systems. During this time, several generations of SSDs have changed, each of which has improved characteristics in terms of performance, reliability and maximum capacity. But the question of monitoring the SSD write resource is still relevant.

SSD Health Monitoring in Qsan Arrays

Solid state drives, due to their physical structure, have a pre-limited write resource. And the fact that much more data is actually written to the SSD than is sent to it by the host (especially as part of a RAID group) brings us even closer to the designated limit. This circumstance is a kind of fear among some users before using an SSD.

In fact, not everything is so bad. Estimated resource DWPD is given for the entire warranty period of the drive (usually 3-5 years). And therefore, the real TBW recording resource will be very impressive, which allows you not to be afraid to β€œwipe” the SSD in just a few months. Moreover, in some cases, you can temporarily use drives in a more intensive mode than provided by the manufacturer just due to high TBW values. However, all this does not at all eliminate the need to monitor the current write resource of each specific SSD in order to proactively replace it when certain thresholds are reached.

Each storage vendor implements this functionality in its own way. But more often than not, it's just a good/failed drive property. Qsan in their All Flash systems, on the contrary, made a complete visualization of the parameters of the current SSD activity in the form of a separate module called QSLife. This module is an integral part of the new operating system XEVO, under which all Qsan storage systems will work in the future.

For each SSD in the system, the current "standard of living" is displayed in the most accessible form. It's no secret that all modern SSDs keep their own records of the blocks written to them. Based on these values, the system calculates the wear rate of the drive in accordance with its markup. The final result is displayed as a percentage of the brand new SSD. Also note that the degree of wear is calculated not only for the period of time during which the drive worked as part of the Qsan All Flash array, but for its entire lifetime, including operation in other systems (if any).

SSD Health Monitoring in Qsan Arrays

In addition to the simplified information about the drive, you can find out some details. In particular, the amount of data recorded on it for the entire service life. And during the time that the drive worked as part of All Flash Array Qsan, graphs of its work in read and write operations are available. Statistics are collected in real time and are available for any period with a viewing depth of up to one year.

SSD Health Monitoring in Qsan Arrays

Of course, the purpose of this functionality is not only to build beautiful graphs to the delight of the administrator, but also to proactively analyze the state of the drives and prevent potential problems in the future associated with their wear. Therefore, in relation to the β€œlife standard” of an SSD, you can set a lot of thresholds and corresponding actions related to the exhaustion of the SSD write resource.

SSD Health Monitoring in Qsan Arrays

If you look at other storage models (not specialized All Flash, but general purpose) by Qsan, then they do not have a similar visualized report on drives. It is understandable: nevertheless, the flagship should somehow differ from the mainstream. However, such monitoring is necessarily carried out in the regular product line. Yes, without collecting usage and performance statistics. But the main function for tracking the recording resource is present.

SSD Health Monitoring in Qsan Arrays

Due to the constant improvement of solid-state drive production technologies, the question of their reliability has somewhat subsided. But, nevertheless, monitoring the resource of their record is still relevant. Such properly configured monitoring will allow the administrator to predict SSD aging in advance in accordance with real current loads, and company management to calculate TCO (total cost of ownership) indicators.

Source: habr.com

Add a comment