How to choose storage without shooting yourself in the foot

Introduction

It's time to buy storage. Which one to take, who to listen to? Vendor A talks about vendor B, and then there is integrator C, who tells the opposite and advises vendor D. In such a situation, even an experienced storage architect will be dizzy, especially with all the new vendors and SDS and hyperconvergence that are fashionable today.

So, how do you figure it all out and not be fooled? We (AntonVirtual Anton Zhbankov and raven Evgeny Elizarov) let's try to tell about it in Russian in white.
The article has much in common, and in fact is an extension of “Virtualized Data Center Design” in terms of choosing storage systems and reviewing storage technologies. We will briefly review the general theory, but we recommend that you read the referenced article as well.

Why

You can often observe the situation when a new person comes to a forum or a specialized chat room, such as Storage Discussions and asks the question: “I am offered two storage options - ABC SuperStorage S600 and XYZ HyperOcean 666v4, what would you advise”?

And it begins to measure who has what features of the implementation of terrible and incomprehensible chips, which for an unprepared person are completely Chinese literacy.

So, the key and very first question that you need to ask yourself long before comparing specifications in commercial offers is WHY? Why is this storage needed?

How to choose storage without shooting yourself in the foot

The answer will be unexpected, and very Tony Robbins-style - to store data. Thank you captain! And yet, sometimes we go so far into comparing details that we forget why we are doing it at all.

So, the task of a data storage system is to store and provide access to DATA with a given performance. We'll start with the data.

Data

Data type

What data are we planning to store? A very important question that can cross out many storage systems even from consideration. For example, it is planned to store videos and photos. You can immediately cross out systems designed for random access in a small block, or systems with proprietary chips in compression / deduplication. These can be simply excellent systems, we don’t want to say anything bad. But in this case, their strengths will either become weak on the contrary (video and photos are not compressed) or simply significantly increase the cost of the system.

Conversely, if the intended use is a heavy transactional DBMS, then excellent multimedia streaming systems capable of delivering gigabytes per second would be a poor choice.

Data volume

How much data do we plan to store? Quantity always leads to quality, and this should never be forgotten, especially in this age of exponential data growth. Petabyte-class systems are no longer a rarity, but the larger the volume of petabytes, the more specific the system becomes, the less accessible the familiar functionality of systems with random access of small and medium volume will be. It's trite because only the access statistics tables by blocks become larger than the available amount of RAM on the controllers. Not to mention compression/tearing. Suppose we want to switch the compression algorithm to a more powerful one and compress 20 petabytes of data. How long will it take: six months, a year?

On the other hand, why fence the garden if you need to store and process 500 GB of data? Only 500. Consumer SSDs (with low DWPD) of this size cost nothing. Why build a Fiber Channel factory for this and buy a high-end external storage system costing an iron bridge?

What percentage of the total is hot data? How uneven is the data load? This is where tiered storage technology or Flash Cache can really help if the volume of hot data is meager compared to the total. Or vice versa, with a uniform load throughout the volume, often found in streaming systems (video surveillance, some analytics systems), such technologies will not give anything, and will only increase the cost / complexity of the system.

IP

The flip side of data is the information system that uses that data. The IS has a set of requirements that the data inherits. For more information about ICs, see "Virtualized Data Center Design".

Resiliency / availability requirements

Requirements for fault tolerance / data availability are inherited from the IS using them and are expressed in three numbers − RPO, RTO, availability.

Availability — share for a given period of time during which the data is available for work with them. It is usually expressed in the amount of 9. For example, two nines per year means that the availability is 99%, or otherwise 95 hours of unavailability per year are allowed. Three nines - 9,5 hours a year.

RPO / RTO are not total indicators, but for each incident (accident), in contrast to availability.

RPO — the amount of data lost during the accident (in hours). For example, if there is a backup once a day, then RPO = 24 hours. Those. In the event of an accident and a complete loss of storage, data up to 24 hours (from the time of the backup) can be lost. Based on the RPO specified for the IS, for example, a backup schedule is written. Also, based on RPO, you can understand how much synchronous / asynchronous data replication is needed.

RTO — service recovery time (data access) after an accident. Based on the given RTO value, we can understand whether a metrocluster is needed, or unidirectional replication is enough. Do you need a multi-controller storage system hi end class - too.

How to choose storage without shooting yourself in the foot

performance requirements

While this is an obvious question, it is where most of the difficulty arises. Depending on whether you already have some kind of infrastructure or not, ways to collect the necessary statistics will be built.

You already have storage and are looking for a replacement or want to purchase another one for expansion. Everything is simple here. You understand what services you already have and what you plan to implement in the near future. Based on the current services, you have the ability to collect performance statistics. Decide on the current number of IOPS and current delays - what are these indicators and are they enough for your tasks? This can be done both on the storage system itself and on the part of the hosts that are connected to it.

And you need to watch not just the current load, but for some period (preferably a month). See what are the maximum peaks during the daytime, what kind of load backup creates, etc. If your storage system or its software does not give you a complete set of these data, you can use the free RRDtool, which can work with most of the most popular storage systems and switches and can provide you with detailed performance statistics. It is also worth looking at the load on the hosts that work with this storage system, for specific virtual machines, or what exactly works for you on this host.

How to choose storage without shooting yourself in the foot

It is worth noting separately that if the delays on the volume and the datastore that is on this volume differ quite a lot - you should pay attention to your SAN network, it is highly likely that there are problems with it and before purchasing a new system, you should deal with this issue , because the probability of increasing the performance of the current system is very high.

You are building an infrastructure from scratch, or you are purchasing a system for some new service, the loads of which you are not aware of. There are several options here: chat with colleagues on specialized resources to try to find out and predict the load, contact an integrator who has experience in implementing such services and who can calculate the load for you. And the third option (usually the most difficult, especially when it comes to self-written or rare applications) is to try to find out the performance requirements from the system developers.

And, attention, the most correct option in terms of practical application is a pilot on the current equipment, or equipment provided for testing by the vendor / integrator.

Special requirements

Special requirements - everything that does not fall under the requirements for performance, fault tolerance and functionality for the direct processing and provision of data.

One of the simplest special requirements for a data storage system can be called “alienable storage media”. And it immediately becomes clear that this data storage system must include a tape library or just a tape drive to which the backup is dropped. After that, a specially trained person signs the tape and proudly carries it to a special safe.
Another example of a special requirement is a protected shockproof design.

Where

The second main component in choosing a particular storage system is information about WHERE this storage system will be located. Starting from geography or climatic conditions, and ending with personnel.

Customer

For whom is this storage system planned? The question is based on the following:

State customer / commercial.
A commercial customer has no restrictions, and is not even obliged to conduct tenders, except according to its own internal regulations.

The government customer is a different matter. 44 FZ and other delights with tenders and TK that can be challenged.

Customer under sanctions
Well, here the question is very simple - the choice is limited only to the offers available to this customer.

Internal regulations / approved vendors / models
The question is also extremely simple, but it must be remembered.

Where physically

In this part, we consider all issues with geography, communication channels, and the microclimate in the accommodation.

Staff

Who will work with this storage system? This is no less important than what the storage system can directly do.
No matter how promising, cool and wonderful the storage system from vendor A is, it probably makes little sense to install it if the staff knows how to work only with vendor B, and further purchases and ongoing cooperation with A are not planned.

And, of course, the flip side of the question is how available in a given geographic location is trained personnel directly in the company and potentially in the labor market. For regions, it can make significant sense to choose storage systems with simple interfaces or the possibility of remote centralized management. Otherwise, at some point it can become excruciatingly painful. The Internet is full of stories about how a new employee who came, yesterday's student, configured such that the whole office was killed.

How to choose storage without shooting yourself in the foot

Environment

Well, of course, an important question is in what environment this storage system will work.

  • What about power/cooling?
  • What connection
  • Where will it be mounted?
  • Etc.

Often these questions are taken for granted and are not particularly considered, but sometimes they can turn everything upside down.

What

Vendor

Today (mid-2019), the Russian storage market can be divided into 5 conditional categories:

  1. Top division - honored companies with a wide range from the simplest disk shelves to hi-end (HPE, DellEMC, Hitachi, NetApp, IBM / Lenovo)
  2. The second division - companies with a limited line, niche players, serious SDS vendors or emerging newcomers (Fujitsu, Datacore, Infinidat, Huawei, Pure, etc.)
  3. The third division is low-end niche solutions, cheap SDS, hard-core products on ceph and other open source projects (Infortrend, Starwind, etc.)
  4. SOHO segment - small and ultra-small storage systems at the home / small office level (Synology, QNAP, etc.)
  5. Import-substituted storage systems - this includes both the hardware of the first division with re-labeled labels, and rare representatives of the second (RAIDIX, we will give them the second in advance), but mainly this is the third division (Aerodisk, Baum, Depo, etc.)

The division is rather conditional, and does not mean at all that the third or SOHO segment is bad and cannot be used. In specific projects with a well-defined data set and load profile, they can work very well, far surpassing the first division in terms of price / quality. It is important to first determine the tasks, growth prospects, required functionality - and then Synology will serve you faithfully, and your hair will become soft and silky.

One of the important factors when choosing a vendor is the current environment. How many and what storage systems do you already have, with which storage systems are engineers able to work. Do you need another vendor, another point of contact, will you gradually migrate the entire load from vendor A to vendor B?

Entities should not be multiplied beyond what is necessary.

iSCSI/FC/File

On the issue of access protocols, there is no consensus among engineers, and the debate resembles more theological discussions than engineering ones. But in general, the following points can be noted:

FCoE rather dead than alive.

FC vs iSCSI. One of the key advantages of FC in 2019 over IP storage, a dedicated factory for data access, is offset by a dedicated IP network. FC has no global advantages over IP networks, and IP can be used to build storage systems of any load level, up to systems for heavy DBMS for the ABS of a large bank. On the other hand, the death of FC has been predicted for more than a year, but something constantly interferes with this. Today, for example, some players in the storage market are actively developing the NVMEoF standard. Whether he will share the fate of FCoE - time will tell.

File access also not something unworthy of attention. NFS / CIFS perform well in productive environments and, when properly designed, have no more complaints than block protocols.

Hybrid / All Flash Array

Classic storage systems are of 2 types:

  1. AFA (All Flash Array) - Systems optimized for SSD use.
  2. Hybrid - allowing you to use both HDD and SSD, or a combination of both.

Their main difference is the supported storage efficiency technologies and the maximum level of performance (high IOPS and low latency). Both those and other systems (in most of their models, not counting the low-end segment) can work both as block devices and file ones. The supported functionality also depends on the system level, and for younger models, it is most often cut to the minimum level. This is worth paying attention to when you study the characteristics of a particular model, and not just the capabilities of the entire line as a whole. Also, of course, its technical characteristics depend on the level of the system, such as the processor, the amount of memory, cache, the number and types of ports, etc. From the management point of view, AFAs differ from hybrid (disk) systems only in the implementation of mechanisms for working with SSD drives, and even if you use an SSD in a hybrid system, this does not mean at all that you can get a level of performance at the AFA level of the system . Also, in most cases, inline efficient storage mechanisms are disabled on hybrid systems, and their inclusion leads to a loss in performance.

Special storage systems

In addition to general-purpose storage systems focused primarily on online data processing, there are special storage systems with key principles that are fundamentally different from the usual ones (low latency, high IOPS):

Media.

These systems are designed for storing and processing media files that are large in size. Resp. the delay becomes practically unimportant, and the ability to send and receive data in a wide band in many parallel streams comes to the fore.

Deduplicating storage for backups.

Since backups differ in a rare similarity to each other under normal conditions (the average backup differs from yesterday's by 1-2%), this class of systems packs the data recorded on them extremely efficiently within a fairly small number of physical media. For example, in some cases, data compression ratios can reach 200 to 1.

Object storage.

These storage systems do not have the usual block access volumes and file shares, and most of all they resemble a huge database. Access to an object stored in such a system is carried out by a unique identifier, or by metadata (for example, all JPEG format objects with a creation date between XX-XX-XXXX and YY-YY-YYYY).

Compliance system.

Not so common in Russia today, but it is worth mentioning them. The purpose of such storage systems is guaranteed data storage to comply with security policies or regulatory requirements. In some systems (for example, EMC Centera), a data deletion prohibition function was implemented - as soon as the key is turned and the system has switched to this mode, neither the administrator nor anyone else can physically delete the already recorded data.

Proprietary technologies

flash cache

Flash Cache is a common name for all proprietary technologies for using flash memory as a second-level cache. When using a flash cache, storage is usually calculated to provide a steady load from magnetic disks, while the peak is served by the cache.

At the same time, it is necessary to understand the load profile and the degree of localization of accesses to blocks of storage volumes. Flash cache is a technology for workloads with highly localized requests, and is practically not applicable to evenly loaded volumes (such as for analytics systems).

There are two flash cache implementations available on the market:

  • Read only. In this case, only read data is cached, and the write goes directly to the disks. Some manufacturers, such as NetApp, believe that writing to their storage systems is going through in an optimal way, and the cache will not help in any way.
  • read/write. Not only reads are cached, but also writes, which allows you to buffer the stream and reduce the impact of the RAID Penalty, and as a result, improve overall performance for storage systems with a less than optimal write mechanism.

Tiering

Tiered storage (tiring) is a technology for combining levels with different performance into a single disk pool, such as SSD and HDD. In the case of a pronounced uneven access to data blocks, the system will be able to automatically balance the data blocks, moving the loaded ones to a high-performance level, and the cold ones, on the contrary, to a slower one.

Hybrid systems of the lower and middle classes use tiered storage with data moving between tiers on a schedule. At the same time, the size of the tiered storage block for the best models is 256 MB. These features do not allow us to consider tiered storage technology as a performance improvement technology, as many people mistakenly believe. Tiered storage in lower and middle class systems is a technology for optimizing the cost of storage for systems with a pronounced load unevenness.

Snapshot

No matter how much we talk about the reliability of storage systems, there are many opportunities to lose data that do not depend on hardware problems. It can be like viruses, hackers or any other unintentional deletion/corruption of data. For this reason, productive data backup is an essential part of an engineer's job.

A snapshot is a snapshot of a volume at a point in time. During the operation of most systems, such as virtualization, databases, etc. we need to take such a snapshot from which we will copy the data to the backup, while our IPs can safely continue to work with this volume. But it is worth remembering that not all snapshots are equally useful. Different vendors have different approaches to creating snapshots related to their architecture.

CoW (Copy-On-Write). When you try to write a data block, its original contents are copied to a special area, after which the recording proceeds normally. This prevents data corruption inside the snapshot. Naturally, all these “parasitic” data manipulations cause an additional load on the storage system, and for this reason, vendors with a similar implementation do not recommend using more than a dozen snapshots, and do not use them at all on highly loaded volumes.

RoW (Redirect-on-Write). In this case, the original volume is naturally frozen, and when you try to write a data block, the storage system writes data to a special area in free space, changing the location of this block in the metadata table. This allows you to reduce the number of rewrite operations, which eventually levels out the drop in performance and removes restrictions on snapshots and their number.

Snapshots are also of two types in relation to applications:

Application Consistent. At the moment of creating a snapshot, the storage system pulls an agent in the consumer's operating system, which forcibly flushes disk caches from memory to disk and forces this application to be made. In this case, when restoring from a snapshot, the data will be consistent.

crash consistent. In this case, nothing like this happens and the snapshot is created as is. In the case of recovery from such a snapshot, the picture is identical as if the power suddenly turned off and some data loss is possible, hung in the caches and never reached the disk. Such snapshots are easier to implement and do not cause performance drops in applications, but are less reliable.

Why are snapshots needed on data storage systems?

  • Agentless backup directly from storage
  • Create test environments based on real data
  • In the case of file storage, it can be used to create VDI environments through the use of storage snapshots instead of a hypervisor
  • Achieve low RPOs by creating scheduled snapshots at a frequency much higher than the backup frequency

Cloning

Volume cloning - works on a similar principle as snapshots, but serves not only to read data, but to fully work with them. We have the ability to get an exact copy of our volume, with all the data on it, without making a physical copy, which will save space. Typically, volume cloning is used either in Test & Dev or if you want to check the performance of some updates on your IP. Cloning will allow you to do this as quickly and economically as possible in terms of disk resources, because. only changed data blocks will be written.

Replication / logging

Replication is a mechanism for creating a copy of data on another physical storage system. Usually there is a proprietary technology for each vendor that works only within its own line. But there are also third-party solutions, including those working at the hypervisor level, such as VMware vSphere Replication.

The functionality of proprietary technologies and their ease of use are usually much superior to universal ones, but they turn out to be inapplicable when, for example, it is necessary to make a replica from NetApp to HP MSA.

Replication is divided into two subspecies:

Synchronous. In the case of synchronous replication, the write operation is forwarded to the second storage immediately and execution is not acknowledged until the remote storage acknowledges. Due to this, the access delay increases, but we have an exact mirror copy of the data. Those. RPO = 0 for loss of primary storage.

asynchronous. Write operations are executed only on the main storage system and are acknowledged immediately, in parallel accumulated in the buffer for bursting to the remote storage system. This type of replication is relevant for less valuable data, or for low-bandwidth channels or those with high latency (typical for distances over 100 km). According to RPO = burst frequency.

Often, along with replication, there is a mechanism journaling disk operations. In this case, a special area is allocated for logging and write operations of a certain depth in time, or limited by the volume of the log, are stored. For individual proprietary technologies, such as EMC RecoverPoint, there is integration with system software that allows you to link specific bookmarks to a specific log entry. Thanks to this, it is possible to roll back the state of the volume (or create a clone) not just to April 23, 11 hours 59 seconds 13 milliseconds, but to the moment preceding “DROP ALL TABLES; COMMIT".

metro cluster

Metro cluster is a technology that allows you to create bidirectional synchronous replication between two storage systems in such a way that from the outside this pair looks like one storage system. It is used to create clusters with geographically separated shoulders at metro distances (less than 100 km).

Using the example of use in a virtualization environment, a metrocluster allows you to create a datastore with virtual machines that is available for recording from two data centers at once. In this case, a hypervisor-level cluster is created, consisting of hosts in different physical data centers, connected to this datastore. Which allows you to do the following:

  • Full automation of the recovery process after the death of one of the data centers. Without any additional funds, all VMs running in the dead data center will be automatically restarted in the remaining one. RTO = HA cluster timeout (15 seconds for VMware) + operating system boot time and service start time.
  • Disaster avoidance or, in Russian, disaster avoidance. If work on power supply is planned in data center 1, then in advance, before the start of work, we have the opportunity to migrate all the important load to data center 2 non-stop.

virtualisation

Storage virtualization is technically using volumes from another storage system as disks. A storage virtualizer can simply throw someone else's volume to the consumer as its own, simultaneously mirroring it to another storage system, or even create a RAID from external volumes.
Classical representatives in the storage virtualization class are EMC VPLEX and IBM SVC. And of course storage with virtualization function - NetApp, Hitachi, IBM / Lenovo Storwize.

Why might it be needed?

  • Redundancy at the storage level. A mirror is created between volumes, and one half can be on HP 3Par, and the other on NetApp. A virtualizer from EMC.
  • Moving data with minimal downtime between storage systems from different manufacturers. Let's assume that the data needs to be migrated from the old 3Par, which will be decommissioned, to the new Dell. In this case, consumers are disconnected from 3Par, volumes are thrown under VPLEX and are already presented to consumers again. Since not a bit has changed on the volume, the work continues. The process of mirroring the volume to the new Dell starts in the background, and upon completion, the mirror is broken and 3Par is turned off.
  • Organization of metro clusters.

Compression / deduplication

Compression and deduplication are technologies that allow you to save disk space on your storage system. It is worth mentioning right away that not all data is subject to compression and / or deduplication in principle, while some types of data are compressed and deduplicated better, and some are vice versa.

Compression and deduplication are of 2 types:

inline - compression and deduplication of data blocks occurs before writing this data to disk. Thus, the system only calculates the hash of the block and compares it in the table with those already available. Firstly, this is faster than just writing to disk, and secondly, we do not waste extra disk space.

Post - when these operations are already carried out on the recorded data that are on the disks. Accordingly, the data is first written to the disk, and only then, the hash is calculated and the extra blocks are removed and the disk resources are released.

It is worth saying that most vendors use both types, which allows you to optimize these processes and thereby increase their efficiency. Most storage vendors have utilities available that allow you to analyze your datasets. These utilities work according to the same logic that is implemented in the storage system, so the estimated level of efficiency will be the same. Also, keep in mind that many vendors have performance guarantee programs that promise at least a performance level for a certain (or all) data type. And do not neglect this program, because by calculating the system for your tasks, taking into account the efficiency coefficient of a particular system, you can save on volume. It is also worth considering that these programs are designed for AFA systems, but due to the purchase of a smaller amount of SSD than HDD in classic systems, this will reduce their cost, and if not equal to the cost of a disk system, then quite close to it.

Model

And here we come to the right question.

“Here I am offered two storage options - ABC SuperStorage S600 and XYZ HyperOcean 666v4, what do you recommend”

It turns into “Here I am offered two storage options - ABC SuperStorage S600 and XYZ HyperOcean 666v4, what do you recommend?

Target load mixed VMware virtual machines with production/test/development contours. Test = productive. 150 TB each with a peak performance of 80 IOPS 000kb in a 8% random access block 50/80 read-write. 20 TB for development, 300 IOPS is enough, 50 random, 000 write.

Productivity presumably to the metrocluster RPO = 15 minutes RTO = 1 hour, development to asynchronous replication RPO = 3 hours, test on one site.

There will be 50TB DBMS, journaling would be nice for them.

We have Dell servers everywhere, old Hitachi storage systems, they can hardly cope, we plan to increase 50% of the load in terms of volume and performance”

As the saying goes, a well-formulated question contains 80% of the answer.

Additional Information

What is worth reading additionally according to the authors

Books

  • Olifer and Olifer "Computer networks". The book will help to systematize and possibly better understand how the data transmission medium for IP / Ethernet storage systems works.
  • "EMC Information Storage and Management". An excellent book on the basics of storage, why, how and why.

Forums and chats

General recommendations

Prices

Now, as for prices - in general, prices for storage systems, if they come across, are usually List price, from which each customer receives an individual discount. The size of the discount consists of a large number of parameters, so it is simply impossible to predict what final price your company will receive without asking the distributor. But at the same time, recently low-end models began to appear in ordinary computer stores, such as, for example nix.ru or xcom-shop.ru. In them, you can immediately purchase the system you are interested in at a fixed price, like any computer components.

But I would like to note right away that a direct comparison by TB/$ is not correct. If approached from this point of view, then the cheapest solution would be a simple JBOD + server, which will not give either the flexibility or the reliability that a full-fledged, two-controller storage system provides. This does not mean at all that JBOD is filthy and dirty, you just need to understand very clearly how and for what purposes you will use this solution. You can often hear that there is nothing to break in JBOD, there is only one backplane. However, backplanes sometimes fail. Everything breaks sooner or later.

Total

It is necessary to compare systems with each other not only by price, or not only by performance, but by the totality of all indicators.

Buy an HDD only if you are sure you need an HDD. For low loads and incompressible data types, otherwise, you should turn to SSD storage efficiency programs that most vendors now have (and they really work, even in Russia), but it all depends on the applications and data that will be be located on this storage system.

Don't go cheap. Sometimes under these lies a lot of unpleasant moments, one of which Evgeny Elizarov described in his articles about infortrend. And that, in the end, this cheapness can come out sideways to you. Do not forget - "the miser pays twice."

Source: www.habr.com

Add a comment