Virtualized Data Center Design

Virtualized Data Center Design

Introduction

An information system from the user's point of view is well defined in GOST RV 51987 - "an automated system, the result of which is the presentation of output information for subsequent use." If we consider the internal structure, then in fact any IS is a system of interconnected algorithms implemented in the code. In the broadest sense of the Turing-Church thesis, an algorithm (and sl-but IS) transforms a set of input data into a set of output data.
It can even be said that the transformation of input data is the meaning of the existence of an information system. Accordingly, the value of IS and the entire IS complex is determined through the value of input and output data.
Based on this, design must begin and take data as a basis, adjusting the architecture and methods to the structure and significance of the data.

Stored data
The key stage of preparation for design is obtaining the characteristics of all data sets planned for processing and storage. These characteristics include:
- Data volume;
— Information about the life cycle of data (growth of new data, lifetime, processing of obsolete data);
— Classification of data with t.z. the impact on the company's core business (i.e. the triad of confidentiality, integrity, availability) along with financial performance (eg cost of data loss in the last hour);
— Geography of data processing (physical location of processing systems);
— Regulatory requirements for each data class (eg FZ-152, PCI DSS).

Information Systems

Data is not only stored, but also processed (transformed) by information systems. The next step after obtaining the characteristics of the data is the most complete inventory of information systems, their architectural features, interdependencies and infrastructure requirements in conventional units for four types of resources:
— Processor computing power;
- The amount of RAM;
— Requirements for the volume and performance of the data storage system;
— Requirements for the data transmission network (external channels, channels between IS components).
In this case, the requirements should be for each service / microservice as part of the IS.
Separately, it is necessary to note the availability of data on the impact of IS on the company's core business, which is mandatory for correct design, in the form of the cost of IS downtime (rubles per hour).

Threat Model

It is mandatory to have a formal model of threats from which it is planned to protect data / services. At the same time, the threat model includes not only confidentiality aspects, but also integrity and availability. Those. For example:
— Failure of the physical server;
- Failure of the top-of-the-rack switch;
— Break of the optical communication channel between the data center;
- Failure of the entire operational storage system.
In some cases, threat models are written not only for infrastructure components, but also for specific IS or their components, such as a DBMS failure with a logical destruction of the data structure.
All decisions within the project to protect against an undescribed threat are redundant.

Regulatory requirements

If the processed data is subject to special rules established by regulators, information about data sets and processing/storage rules is mandatory.

RPO/RTO Targets

Designing any type of protection requires target data loss and service recovery targets for each of the described threats.
In this case, ideally, RPO and RTO should have an associated cost of data loss and downtime per unit of time.

Virtualized Data Center Design

Resource pooling

After collecting all the primary input information, the first step is to group data sets and IS into pools based on threat models and regulator requirements. The type of separation of various pools is determined - programmatically at the level of system software or physically.
examples:
- The circuit that processes personal data is completely physically separated from other systems;
- Backups are stored on a separate storage system.

In this case, the pools can be with incomplete independence, for example, two pools of computing resources (processor power + RAM) are defined that use a single pool of data storage and a single pool of data transmission resources.

Processor power

Virtualized Data Center Design

The abstract processing power requirement of a virtualized data center is measured in terms of the number of virtual processors (vCPUs) and their consolidation ratio on physical processors (pCPUs). In this particular case, 1 pCPU = 1 physical processor core (excluding Hyper-Threading). The number of vCPUs is summed across all defined resource pools (each of which can have its own consolidation ratio).
The consolidation ratio for loaded systems is obtained empirically, based on existing infrastructure, or during pilot installation and load testing. For unloaded systems, "best practice" is applied. In particular, VMware calls an average ratio of 8:1.

RAM

The total RAM requirement is obtained by simple summation. The use of oversubscription by RAM is not recommended.

Storage resources

Storage requirements are obtained by simply summing up all pools in terms of capacity and performance.
Performance requirements are expressed in IOPS combined with an average read/write ratio and, if necessary, maximum response latency.
Separately, the requirements for ensuring quality of service (QoS) for specific pools or systems should be specified.

Data network resources

The data network requirements are obtained by simply summing up all the bandwidth pools.
Separately, the requirements for ensuring quality of service (QoS) and delays (RTT) for specific pools or systems should be specified.
As part of the requirements for data network resources, the requirements for isolating and / or encrypting network traffic and preferred mechanisms (802.1q, IPSec, etc.) are also indicated.

Choice of architecture

This guide does not cover any choice other than x86 architecture and 100% server virtualization. Therefore, the choice of computing subsystem architecture comes down to choosing a server virtualization platform, server form factor, and general server configuration requirements.

The key point of choice is the certainty in using the classical approach with the separation of the functions of processing, storing and transmitting data or convergent.

classical architecture implies the use of intelligent external subsystems for storage and transmission of data, while servers bring only processor power and RAM to the common pool of physical resources. In the extreme case, servers become completely anonymous, having not only their own disks, but even a system identifier. In this case, the OS or hypervisor is loaded from built-in flash media or from an external storage system (boot from SAN).
Within the framework of classical architecture, the choice between blades (blade) and rack-mount (rack) is carried out primarily from the following principles:
- Economic efficiency (on average, rack servers are cheaper);
- Computational density (higher for blades);
- Energy consumption and heat dissipation (the blades have a higher specific unit per unit);
— Scalability and manageability (blades in general require less effort for large installations);
- Use of expansion cards (very limited choice for blades).
converged architecture (also known as hyperconverged) involves combining the functions of processing and storing data, which leads to the use of local server disks and, as a result, the rejection of the form factor of classic blades. For converged systems, either rack servers or cluster systems are used, combining several blade servers and local disks in a single housing.

CPU/Memory

To correctly calculate the configuration, you need to understand the type of load for the environment or each of the independent clusters.
CPU bound – an environment limited in terms of performance by processor power. Adding RAM will not change anything in terms of performance (number of VMs per server).
memory bound - an environment limited by RAM. More RAM on the server allows you to run more VMs per server.
GB / MHz (GB / pCPU) - the average ratio of the consumption of this particular load of RAM and processor power. Can be used to calculate the required amount of memory for a given performance and vice versa.

Server configuration calculation

Virtualized Data Center Design

To begin with, it is necessary to determine all types of load and make a decision on combining or separating different computing pools into different clusters.
Further, for each of the defined clusters, the ratio GB / MHz is determined with a load known in advance. If the load is not known in advance, but there is a rough understanding of the level of processor load, you can use the standard vCPU:pCPU ratios to convert pool requirements to physical ones.

For each cluster, we divide the sum of the requirements of the vCPU pools by the coefficient:
vCPUtotal / vCPU:pCPU = pCPUtotal - the required number of physical. nuclei
pCPUsum / 1.25 = pCPUht - number of cores adjusted for Hyper-Threading
Let's assume that it is necessary to calculate the cluster for 190 cores / 3.5TB of RAM. At the same time, we accept the target 50% load of processor power and 75% of RAM.

pCPU
190
CPU utility
50%

Mem
3500
Mem utility
75%

socket
Core
srv/cpu
Srv Mem
Srv/Mem

2
6
25,3
128
36,5

2
8
19,0
192
24,3

2
10
15,2
256
18,2

2
14
10,9
384
12,2

2
18
8,4
512
9,1

In this case, we always use rounding up to the nearest integer (=ROUNDUP(A1;0)).
From the table it becomes obvious that several server configurations are balanced for the targets:
— 26 servers 2*6c / 192 GB
— 19 servers 2*10c / 256 GB
— 10 servers 2*18c / 512 GB

The selection from these configurations will then need to be made based on additional factors such as thermal envelope and available cooling, servers already in use, or cost.

Features of choosing a server configuration

Wide VM. If you need to host wide VMs (comparable to 1 NUMA node or more), it is recommended, if possible, to choose a server with a configuration that allows such VMs to remain within the NUMA node. With a large number of wide VMs, there is a danger of fragmenting cluster resources, and in this case, servers are selected that allow wide VMs to be placed as densely as possible.

Single failure domain size.

Server sizing is also based on the principle of minimizing the single failure domain. For example, when choosing between:
— 3 x 4*10c / 512 GB
— 6 x 2*10c / 256 GB
Ceteris paribus, you must choose the second option, since when one server fails (or is serviced), not 33% of the cluster resources are lost, but 17%. In the same way, the number of VMs and ISs affected by the accident is halved.

Calculation of classic storage by performance

Virtualized Data Center Design

Classic storage is always calculated according to the worst case scenario, excluding the influence of the operational cache and optimization of operations.
We take the mechanical performance from the disk (IOPSdisk) as the baseline performance indicators:
- 7.2k - 75 IOPS
- 10k - 125 IOPS
- 15k - 175 IOPS

Next, the number of disks in the disk pool is calculated using the following formula: = TotalIOPS * (RW + (1 -RW) * RAIDPen) / IOPSdisk. Where:
TotalIOPS – total required performance in IOPS from the disk pool
RW – percentage of read operations
RAID pen – RAID penalty for the selected RAID level

More details about RAID Device and RAID Penalty are covered here − Storage performance. Part one. и Storage performance. Part two. и Storage performance. Part Three

Based on the received number of disks, possible options are calculated that meet the requirements for storage capacity, including options with tiered storage.
Calculation of systems using SSD as a storage tier is considered separately.
Features of calculating systems with Flash Cache

flash cache - a common name for all proprietary technologies for using flash memory as a second-level cache. When using a flash cache, storage is usually calculated to provide a steady load from magnetic disks, while the peak is served by the cache.
At the same time, it is necessary to understand the load profile and the degree of localization of accesses to blocks of storage volumes. Flash cache is a technology for workloads with highly localized requests, and is practically not applicable to evenly loaded volumes (such as for analytics systems).

Calculation of low-end / mid-range hybrid systems

Hybrid systems of the lower and middle classes use tiered storage with data moving between tiers on a schedule. At the same time, the size of the tiered storage block for the best models is 256 MB. These features do not allow us to consider tiered storage technology as a performance improvement technology, as many people mistakenly believe. Tiered storage in lower and middle class systems is a technology for optimizing the cost of storage for systems with a pronounced load unevenness.

For tiered storage, the performance of the top tier is calculated first, while the bottom tier is only considered to contribute the missing storage capacity. For a hybrid tiered system, it is mandatory to use flash cache technology for the tiered pool to compensate for the performance hit for suddenly heated data from the lower tier.

Using an SSD in a Tiered Disk Pool

Virtualized Data Center Design

The use of SSD in a multi-level disk pool has variations, depending on the specific implementation of flash cache algorithms from a given manufacturer.
The general storage policy practice for a disk pool with an SSD tier is SSD first.
Read Only Flash Cache. For a read-only flash cache, the SSD storage tier appears when write operations are highly localized regardless of the cache.
Read/Write Flash Cache. In the case of flash write cache, the maximum cache size is first set, and the SSD storage tier appears only when the cache size is insufficient to serve the entire localized load.
SSD and cache performance is calculated each time based on the manufacturer's recommendations, but always for the worst case.

Source: habr.com

Add a comment