Huawei OceanStor Dorado 18000 V6: what is its high-end nature

We argue in detail what makes the OceanStor Dorado 18000 V6 a truly high-end storage system with a decent reserve for the coming years. At the same time, we dispel common fears about All-Flash storage and show how Huawei squeezes the most out of them: end-to-end NVMe, additional caching on SCM, and a whole bunch of other solutions.
Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

New data landscape - new data storage

Data intensity is on the rise across all industries. And the banking sector is a clear illustration of this. Over the past few years, the number of banking transactions has increased more than ten times. As shows BCG study, only in Russia in the period from 2010 to 2018 the number of non-cash transactions using plastic cards showed more than a thirty-fold increase - from 5,8 to 172 per person per year. First of all, the triumph of micropayments: most of us have become related to online banking, and the bank is now at our fingertips - on the phone.

The IT infrastructure of a credit institution must be ready for such a challenge. And this is really a challenge. Among other things, if earlier the bank needed to ensure the availability of data only during its business hours, now it is 24/7. Until recently, 5 ms was considered an acceptable latency rate, so what? Now even 1 ms is overkill. For a modern storage system, the target is 0,5 ms.

The same with reliability: in the 2010s, an empirical understanding was formed that it is enough to bring its level to “five tens” - 99,999%. True, this understanding has become obsolete. In 2020, it is absolutely normal for a business to require 99,9999% for storage and 99,99999% for the overall architecture. And this is not a whim at all, but an urgent need: either there is no time window for infrastructure maintenance, or it is tiny.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

For clarity, it is convenient to project these indicators onto the plane of money. The easiest way is on the example of financial institutions. The chart above shows how much each of the world's top 10 banks earn per hour. For the Industrial and Commercial Bank of China alone, this is no less than $5 million. This is exactly how much an hour of downtime of the IT infrastructure of the largest credit organization in China will cost (and only lost profits are taken into account in the calculation!). From this perspective, it is clear that the reduction in downtime and the increase in reliability, not only by a few percent, but even by fractions of a percent, are completely rationally justified. Not only for reasons of increasing competitiveness, but simply for the sake of maintaining market positions.

Comparable changes are taking place in other industries. For example, in air transportation: before the pandemic, air travel was only gaining momentum from year to year, and many began to use it almost like a taxi. As for consumer patterns, the habit of total availability of services has taken root in society: upon arrival at the airport, we need to connect to Wi-Fi, access to payment services, access to a map of the area, etc. As a result, the load on infrastructure and services in public spaces increased many times over. And those approaches to its infrastructure, construction, which we considered acceptable even a year ago, are rapidly becoming obsolete.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Is it too early to switch to All-Flash?

To solve the problems mentioned above, in terms of performance, AFA - all-flash arrays, that is, arrays completely built on flash - are the best fit. Unless, until recently, there were doubts about whether they are comparable in reliability with those assembled on the basis of HDDs and hybrid ones. After all, solid-state flash memory has a metric called mean time between failures, or MTBF (mean time between failures). Degradation of cells due to I / O operations, alas, is a given.

So the prospects for All-Flash were overshadowed by the question of how to prevent data loss in the event that the SSD orders to live for a long time. Backup is a familiar option, only the recovery time would be unacceptably large based on modern requirements. Another way out is to set up a second level of storage on spindle drives, however, with such a scheme, some of the advantages of a "strictly flash" system are lost.

However, the numbers say otherwise: the statistics of the giants of the digital economy, including Google, in recent years show that flash is several times more reliable than hard drives. Moreover, both in a short period of time and in a long one: on average, four to six years pass before flash drives fail. In terms of data storage reliability, they are in no way inferior to drives on spindle magnetic disks, or even surpass them.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Another traditional argument in favor of spindle drives is their affordability. No doubt, the cost of storing a terabyte on a hard drive is still relatively low. And if you take into account only the cost of equipment, it is cheaper to keep a terabyte on a spindle drive than on an SSD. However, in the context of financial planning, it matters not only how much a particular device was bought, but also what is the total cost of owning it for a long time - from three to seven years.

From this angle, it's completely different. Even if we ignore deduplication and compression, which, as a rule, are used on flash arrays and make their operation more economically profitable, there remain such characteristics as rack space occupied by media, heat dissipation, and power consumption. And according to them, the flush outperforms its predecessors. As a result, the TCO of flash storage systems, taking into account all parameters, is often almost half as much as in the case of arrays on spindle drives or hybrids.

According to ESG reports, Dorado V6 All-Flash storage systems can achieve a cost of ownership reduction of up to 78% over a five-year interval, including through efficient deduplication and compression, and due to low power consumption and heat dissipation. The German analytical company DCIG also recommends them for use as the best in terms of TCO available today.

The use of solid state drives makes it possible to save usable space, reduce the number of failures, reduce the time for solution maintenance, reduce power consumption and heat dissipation of storage systems. And it turns out that AFA is at least economically comparable to traditional arrays on spindle drives, and often even surpasses them.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Huawei Royal Flush

Among our All-Flash storages, the top place belongs to the hi-end system OceanStor Dorado 18000 V6. And not only among ours: in general, in the industry, it holds the speed record - up to 20 million IPOS in the maximum configuration. In addition, it is extremely reliable: even if two controllers fly at once, or up to seven controllers one after the other, or an entire engine at once, the data will survive. Considerable advantages of the “eighteen thousandth” are given by the AI ​​wired into it, including the flexibility in managing internal processes. Let's see how this is achieved.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

In large part, Huawei has a head start because it is the only manufacturer on the market that makes storage systems itself - completely and completely. We have our own circuitry, our own microcode, our own service.

The controller in OceanStor Dorado systems is built on a processor of Huawei's own design and production - Kunpeng 920. It uses the Intelligent Baseboard Management Controller (iBMC) control module, also ours. AI chips, namely the Ascend 310, which optimize failure predictions and make recommendations for settings, are also Huawei, as well as I / O boards - the Smart I / O module. Finally, the controllers in the SSDs are designed and manufactured by us. All this provided the basis for making an integrally balanced and high-performance solution.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Over the past year, we have implemented a project to introduce this, our most top-end storage system, in one of the largest Russian banks. As a result, more than 40 OceanStor Dorado 18000 V6 units in the metro cluster show stable performance: more than a million IOPS can be removed from each system, and this is taking into account delays due to distance.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

End-to-End NVMe

Huawei's latest storage systems support end-to-end NVMe, which we emphasize for a reason. The traditionally used protocols for accessing drives were developed in hoary IT antiquity: they are based on SCSI commands (hello, 1980s!), Which pull a lot of functions to ensure backward compatibility. Whatever method of access you take, the protocol overhead in this case is colossal. As a result, for storages that use protocols tied to SCSI, the I / O delay cannot be lower than 0,4–0,5 ms. In turn, being a protocol designed to work with flash memory and freed from crutches for the sake of the notorious backward compatibility, NVMe - Non-Volatile Memory Express - knocks down latency to 0,1 ms, moreover, not on the storage system, but on the entire stack, from host to drives. Not surprisingly, NVMe is in line with data storage development trends for the foreseeable future. We also relied on NVMe - and are gradually moving away from SCSI. All Huawei storage systems produced today, including the Dorado line, support NVMe (however, as an end-to-end it is implemented only on the advanced models of the Dorado V6 series).

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

FlashLink: A Fistful of Technologies

The cornerstone technology for the entire OceanStor Dorado line is FlashLink. More precisely, it is a term that combines an integral set of technologies that serve to ensure high performance and reliability. This includes deduplication and compression technologies, the functioning of the RAID 2.0+ data distribution system, the separation of "cold" and "hot" data, full-stripe sequential data recording (random writes, with new and changed data, are aggregated into a large stack and written sequentially, which increases speed read-write).

Among other things, FlashLink includes two important components - Wear Leveling and Global Garbage Collection. They should be dealt with separately.

In fact, any solid state drive is a storage system in miniature, with a large number of blocks and a controller that ensures data availability. And it is provided, among other things, due to the fact that the data from the "killed" cells are transferred to the "not killed". This ensures that they can be read. There are various algorithms for such a transfer. In the general case, the controller tries to balance the wear of all storage cells. This approach has a downside. When data is moved inside the SSD, the number of I / O operations it performs is dramatically reduced. For now, it's a necessary evil.

Thus, if there are a lot of SSDs in the system, a “saw” appears on the performance graph, with sharp ups and downs. The trouble is that one drive from the pool can start data migration at any time, and the overall performance is removed at the same time from all SSDs in the array. But Huawei engineers figured out how to avoid the "saw".

Fortunately, both the controllers in the drives, and the storage controller, and the firmware of Huawei are “native”, these processes in the OceanStor Dorado 18000 V6 are launched centrally, synchronously on all drives in the array. Moreover, at the command of the storage controller, and precisely when there is no heavy I/O load.

The artificial intelligence chip is also involved in choosing the right moment to transfer data: based on the statistics of hits for the previous few months, it is able to predict with the highest probability whether to expect active I / O in the near future, and if the answer is negative, and the load on the system at the current moment is small, then the controller commands all drives: those who need Wear Leveling should do it at once and synchronously.

Plus, the system controller sees what is happening in each cell of the drive, unlike the storage systems of competing manufacturers: they are forced to purchase solid-state media from third-party vendors, which is why cell-level detailing is not available to the controllers of such storages.

As a result, the OceanStor Dorado 18000 V6 has a very short period of performance degradation on the Wear Leveling operation, and it is performed mainly when it does not interfere with any other processes. This gives high stable performance on an ongoing basis.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

What Makes OceanStor Dorado 18000 V6 Reliable

There are four levels of reliability in modern data storage systems:

  • hardware, at the drive level;
  • architectural, at the equipment level;
  • architectural together with the software part;
  • cumulative, relating to the solution as a whole.

Since, we recall, our company designs and manufactures all the components of the storage system itself, we provide reliability at each of the four levels, with the ability to thoroughly monitor what is happening at which of them at the moment.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

The reliability of drives is guaranteed primarily by the previously described Wear Leveling and Global Garbage Collection. When an SSD looks like a black box to the system, it has no idea how exactly the cells wear out in it. For the OceanStor Dorado 18000 V6, the drives are transparent, which makes it possible to evenly balance across all the drives in the array. Thus, it turns out to significantly extend the life of the SSD and secure a high level of reliability of their operation.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Also, the reliability of the drive is affected by additional redundant cells in it. And along with a simple reserve, the storage system uses the so-called DIF cells, which contain checksums, as well as additional codes to protect each block from a single error, in addition to protection at the RAID array level.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

The key to architectural reliability is the SmartMatrix solution. In short, these are four controllers that sit on a passive backplane as part of one engine (engine). Two of these engines - respectively, with eight controllers - are connected to common shelves with drives. Thanks to SmartMatrix, even if seven out of eight controllers cease to function, access to all data, both for reading and writing, will remain. And with the loss of six out of eight controllers, it will even be possible to continue caching operations.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

I / O boards on the same passive backplane are available to all controllers, both on the frontend and on the backend. With such a full-mesh connection scheme, no matter what fails, access to the drives is always preserved.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

It is most appropriate to talk about the reliability of an architecture in the context of the failure modes that the storage system is able to protect against.

The storage will survive the situation without loss if two controllers “fall off”, including at the same time. Such stability is achieved due to the fact that any cache block certainly has two more copies on different controllers, that is, in total it exists in three copies. And at least one is on a different engine. Thus, even if the entire engine stops working - with all four of its controllers - it is guaranteed that all the information that was in the cache memory will be saved, because the cache will be duplicated in at least one controller from the remaining engine. Finally, with a serial connection, you can lose up to seven controllers, and even if they are eliminated in blocks of two, - and again, all I / O and all data from the cache will be preserved.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

When compared with hi-end storage from other manufacturers, it can be seen that only Huawei provides full data protection and full availability even after the death of two controllers or the entire engine. Most vendors use a scheme with so-called controller pairs to which drives are connected. Unfortunately, in this configuration, if two controllers fail, there is a risk of losing I/O access to the drive.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Alas, the failure of a single component is not objectively excluded. In this case, the performance will drop for some time: it is necessary that the paths be rebuilt and access on I / O operations is resumed with respect to those blocks that either came to write, but were not yet written, or were requested to be read. The OceanStor Dorado 18000 V6 has an average rebuild time of approximately one second, significantly less than the closest analogue in the industry (4 s). This is achieved thanks to the same passive backplane: when the controller fails, the rest immediately see its input / output, and in particular which data block has not been written to; as a result, the nearest controller picks up the process. Hence the ability to restore performance in just a second. I must add, the interval is stable: a second for one controller, a second for another, etc.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

In the OceanStor Dorado 18000 V6 passive backplane, all boards are available to all controllers without any additional addressing. This means that any controller is able to pick up I / O on any port. Whatever frontend port I/O comes into, the controller will be ready to process it. Hence - the minimum number of internal transfers and a noticeable simplification of balancing.

Frontend balancing is performed using the multipathing driver, and additional balancing is carried out within the system itself, since all controllers see all I / O ports.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Traditionally, all Huawei arrays are designed in such a way that they do not have a single point of failure. Hot swapping, without rebooting the system, lends itself to all its components: controllers, power modules, cooling modules, I / O boards, etc.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Raises the reliability of the system as a whole and technology such as RAID-TP. This is the name of a RAID group, which allows you to insure against the simultaneous failure of up to three drives. And a 1 TB rebuild consistently takes less than 30 minutes. The best recorded result is eight times faster than with the same amount of data on the spindle drive. Thus, it is possible to use extremely capacious drives, say 7,68 or even 15 TB, and not worry about the reliability of the system.

It is important that the rebuild is carried out not in a spare drive, but in a spare space - a reserve capacity. Each drive has dedicated space used for data recovery after a failure. Thus, the recovery is carried out not according to the “many to one” scheme, but according to the “many to many” scheme, due to which it is possible to significantly speed up the process. And as long as there is free capacity, recovery can continue.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

We should also mention the reliability of a solution from several storages - in a metro cluster, or, in Huawei's terminology, HyperMetro. Such schemes are supported on the entire model range of our data storage systems and allow both file and block access. Moreover, on a block one, it functions both via Fiber Channel and Ethernet (including via iSCSI).

In essence, we are talking about bidirectional replication from one storage system to another, in which the replicated LUN is given the same LUN-ID as the main one. The technology works primarily due to the consistency of caches from two different systems. Thus, for the host it does not matter which side it is on: both here and there it sees the same logical drive. As a result, nothing prevents you from deploying a failover cluster spanning two sites.

For quorum, a physical or virtual Linux machine is used. It can be located on the third site, and the requirements for its resources are small. A common scenario is to rent a virtual site exclusively for hosting a quorum VM.

The technology also allows expansion: two storages - in a metro cluster, an additional site - with asynchronous replication.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Historically, many customers have formed a "storage zoo": a bunch of storage systems from different manufacturers, different models, different generations, with different functionality. However, the number of hosts can be impressive, and often they are virtualized. In such circumstances, one of the priorities of administration is to quickly, uniformly, and conveniently provide logical disks to hosts, preferably in a way that does not delve into where these disks are physically located. That's what our OceanStor DJ software solution is designed for, which can unanimously manage various storage systems and provide services from them without being tied to a specific storage model.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

Same AI

As already mentioned, the OceanStor Dorado 18000 V6 has built-in processors with artificial intelligence algorithms - Ascend. They are used, firstly, to predict failures, and secondly, to form recommendations for tuning, which also increases the performance and reliability of the storage.

The prediction horizon is two months: AI machinery assumes what will happen with a high probability during this time, whether it is time to expand, change access policies, etc. Recommendations are issued in advance, which allows you to plan windows for system maintenance ahead of time.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

The next stage of AI development from Huawei is to bring it to the global level. In the course of service maintenance - failover or recommendations - Huawei aggregates information from logging systems from all our customers' storages. Based on the collected information, an analysis of the occurred or potential failures is carried out and global recommendations are made - based not on the functioning of one specific storage system or even a dozen, but on what is happening and has happened with thousands of such devices. The sample is huge, and based on it, AI algorithms begin to learn extremely quickly, which is why the accuracy of predictions increases significantly.

Compatibility

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

In 2019-2020, there was a lot of insinuation about the interaction of our equipment with VMware products. To finally stop them, we responsibly declare: VMware is a partner of Huawei. All conceivable tests were carried out for the compatibility of our hardware with its software, and as a result, on the VMware website, the hardware compatibility sheet lists the currently available storage systems of our production without any reservations. In other words, with the VMware software environment, you can use Huawei storage, including Dorado V6, with full support.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

The same goes for our collaboration with Brocade. We continue to interact and test our products for compatibility and can confidently state that our storage systems are fully compatible with the latest Brocade FC switches.

Huawei OceanStor Dorado 18000 V6: what is its high-end nature

What's next?

We continue to develop and improve our processors: they become faster, more reliable, their performance grows. We are also improving AI chips - based on them, modules are also produced that speed up deduplication and compression. Those who have access to our configurator may have noticed that these cards are already available for order in Dorado V6 models.

We are also moving towards additional caching on Storage Class Memory - non-volatile memory with especially low latency, about ten microseconds per read. Among other things, SCM gives a performance boost, primarily when working with big data and when solving OLTP tasks. After the next update, SCM cards should become available for order.

And of course, the file access functionality will be expanded across the entire range of Huawei data storage - stay tuned for our updates.

Source: habr.com