Reduce downtime risk with Shared Nothing architecture

The topic of fault tolerance in data storage systems is always relevant, because in our age of widespread virtualization and consolidation of storage resources, storage is the link, the failure of which will lead not just to an ordinary accident, but to long service downtime. Therefore, modern storage systems incorporate many duplicated components (up to controllers). But is such protection sufficient?

Reduce downtime risk with Shared Nothing architecture

Absolutely all vendors, listing the characteristics of storage systems, necessarily mention the high fault tolerance of their solutions, without fail adding the term "without a single point of failure." Let's take a closer look at a typical storage system. To eliminate downtime in maintenance, power supplies, cooling modules, I / O ports, drives (we mean RAID) and, of course, controllers are duplicated in the storage system. If you look closely at this architecture, you can see at least two potential points of failure, which are modestly silent about:

  1. The presence of a single backplane (backplane)
  2. Having one copy of the data

The backplane is a technically complex device that must be subjected to serious testing during production. And therefore, it is extremely rare that it completely fails. However, even in the event of partial failures, such as a non-functioning drive slot, it will need to be replaced with a complete shutdown of the storage system.

Creating multiple copies of data is also not a problem at first glance. So, for example, the Clone functionality in the storage system, which allows you to update a complete copy of the data with some frequency, is quite widespread. However, in case of problems with the same backplane, the copy will be just as inaccessible as the original.

A quite obvious way to overcome these shortcomings is replication to another storage system. If we close our eyes to the expected doubling of the cost of hardware (we still assume that people who choose such a solution think adequately and accept this fact in advance), there will still be possible costs for organizing replication in the form of licenses, additional software and hardware. And most importantly, you will need to somehow ensure the consistency of the replicated data. Those. build a storage virtualizer / vSAN / etc., which also requires financial and time resources.

AccelStor when creating their High Availability systems, they set out to get rid of the shortcomings mentioned above. This is how the interpretation of the Shared Nothing technology appeared, which, loosely translated, means "without the use of shared devices."

Concept Shared Nothing architecture is the use of two independent nodes (controllers), each of which has its own set of data. Between the nodes, synchronous replication is performed through the InfiniBand 56G interface, absolutely transparent to the software running on top of the storage system. As a result, the use of storage virtualizers, software agents, etc. is not required.

Physically, the two-node solution from AccelStor can be implemented in two models:

  • H510 - based on Twin servers in a 2U case, if moderate performance and capacity up to 22TB are required;
  • H710 - based on separate 2U servers, if you need high performance and large capacity (up to 57TB).

Reduce downtime risk with Shared Nothing architecture

Model H510 Twin Server

Reduce downtime risk with Shared Nothing architecture

Model H710 based on separate servers

The use of different form factors is due to the need for a different number of SSDs to achieve a given volume and performance. Plus, the Twin platform is cheaper and allows you to offer more affordable solutions, albeit with some conditional "disadvantage" in the form of a single backplane. Everything else, including the principles of operation, is completely identical for both models.

The data set for each node has two groups FlexiRemap, plus 2 hot spares. Each group is able to withstand the failure of one SSD. All incoming requests to write a node in accordance with ideology FlexiRemap rebuilds into sequential chains with 4KB blocks, which are then written to the SSD in the most comfortable mode for them (sequential write). Moreover, the write confirmation is issued to the host only after the physical placement of data on the SSD, i.e. without caching in RAM. The result is a very impressive performance of up to 600K write IOPS and 1M+ read IOPS (Model H710).

As mentioned earlier, the synchronization of datasets occurs in real time through the InfiniBand 56G interface, which has a high throughput and low latency. In order to use the communication channel as efficiently as possible when transmitting small packets. Because There is only one communication channel; a dedicated 1GbE link is used for additional heart rate checks. Only heartbeat is transmitted through it, so there are no requirements for speed characteristics.

In the case of an increase in system capacity (up to 400 + TB) due to expansion shelves they are also connected in pairs for a β€œno single point of failure” concept.

For additional data protection (in addition to the fact that AccelStor already has two copies), a special behavior algorithm is used in the event of a failure of any SSD. If the SSD fails, the node will start rebuilding data to one of the hot spare drives. The FlexiRemap group, which is in the degraded state, will switch to read only mode. This is done to eliminate interference between write and rebuild operations on the backup disk, which ultimately speeds up the recovery process and reduces the time when the system is potentially vulnerable. Upon completion of the rebuild, the node again switches to the normal read-write mode.

Reduce downtime risk with Shared Nothing architecture

Of course, as with other systems, during the rebuild, the overall performance decreases (after all, one of the FlexiRemap groups does not work for writing). But the recovery process itself is as fast as possible, which distinguishes AccelStor systems from solutions from other vendors.

Another useful feature of the Nothing Shared architecture is the operation of nodes in the so-called true active-active mode. Unlike the "classic" architecture, where only one controller owns a specific volume / pool, and the second simply performs I / O operations, in systems AccelStor each node works with its own data set and does not send requests to its β€œneighbor”. As a result, the overall performance of the system is improved due to the parallel processing of I / O requests by nodes and access to drives. Also, in fact, there is no such thing as failover, since it is simply not necessary to transfer control of volumes to another node in case of failure.

If we compare the technology of the Nothing Shared architecture with a full-fledged duplication of storage systems, then, at first glance, it will be slightly inferior to the full-fledged implementation of disaster recovery in flexibility. This is especially true for organizing a communication line between storage systems. So, in the H710 model, it is possible to spread nodes over a distance of up to 100m due to the use of not very cheap InfiniBand active optical cables. But even when compared with the usual implementation of synchronous replication from other vendors through the available FibreChannel, even over longer distances, the solution from AccelStor will be cheaper and easier to install / operate, because. there is no need to install storage virtualizers and / or integrate with software (which is not always possible in principle). Plus, do not forget that AccelStor solutions are All Flash arrays with higher performance than "classic" storage systems with SSD only.

Reduce downtime risk with Shared Nothing architecture

With Nothing Shared technology, AccelStor's architecture can achieve 99.9999% storage availability at a very reasonable cost. Together with high reliability of the solution, including through the use of two copies of data, and impressive performance due to proprietary algorithms FlexiRemap, solutions from AccelStor are excellent candidates for key positions in building a modern data center.

Source: habr.com

Add a comment