[Supercomputing 2019]. Multi-cloud storage as an area of ​​application for the new Kingston DC1000M drives

Imagine that you are launching an innovative medical business - individual selection of drugs based on analysis of the human genome. Each patient has 3 billion gene pairs, and a regular server on x86 processors will take several days to calculate. You know that you can speed up the process on a server with an FPGA processor that parallelizes calculations across thousands of threads. It will complete the genome calculation in about an hour. Such servers can be rented from Amazon Web Services (AWS). But here’s the thing: the customer, the hospital, is categorically against placing genetic data in the provider’s cloud. What should I do? Kingston and cloud startup showed architecture at Supercomputing-2019 exhibition Private MultiCloud Storage (PMCS), which solves this problem.

[Supercomputing 2019]. Multi-cloud storage as an area of ​​application for the new Kingston DC1000M drives

Three conditions for high performance computing

Calculating the human genome is not the only task in the field of high-performance computing (HPC, High Performance Computing). Scientists calculate physical fields, engineers calculate airplane parts, financiers calculate economic models, and together they analyze big data, build neural networks, and make many other complex calculations.

The three conditions of HPC are enormous computing power, very large and fast storage, and high network throughput. Therefore, the standard practice for conducting LPC calculations is in the company’s own data center (on-premises) or at a provider in the cloud.

But not all companies have their own data centers, and those that do often are inferior to commercial data centers in terms of resource efficiency (capital expenditures are required to purchase and update hardware and software, pay for highly qualified personnel, etc.) . Cloud providers, on the contrary, offer IT resources according to the “Pay-as-you-go” operating cost model, i.e. rent is charged only for the period of use. When the calculations are completed, servers can be removed from the account, thereby saving IT budgets. But if there is a legislative or corporate ban on data transfer to the provider, HPC computing in the cloud is not available.

Private MultiCloud Storage

The Private MultiCloud Storage architecture is designed to provide access to cloud services while physically leaving the data itself on the enterprise site or in a separate secure compartment of the data center using a colocation service. Essentially, it is a data-centric distributed computing model where cloud servers work with remote storage systems from a private cloud. Accordingly, using the same local data storage, you can work with cloud services from the largest providers: AWS, MS Azure, Google Cloud Platform‎, etc.

Showing an example of the implementation of PMCS at the Supercomputing-2019 exhibition, Kingston presented a sample of a high-performance data storage system (SSD) based on DC1000M SSD drives, and one of the cloud startups presented StorOne S1 management software for software-defined storage and dedicated communication channels with major cloud providers .

It should be noted that PMCS, as a working model of cloud computing with private storage, is designed for the North American market with the developed network connectivity between data centers that is supported on the AT&T and Equinix infrastructure. Thus, the ping between a colocation storage system in any Equinix Cloud Exchange node and the AWS cloud is less than 1 millisecond (source: ITProToday).

In the demonstration of the PMCS architecture shown at the exhibition, the storage system on DC1000M NVMe disks was located in colocation, and virtual machines were installed in the AWS, MS Azure, and Google Cloud Platform clouds, which pinged each other. The client-server application remotely worked with the Kingston storage system and HP DL380 servers in the data center and, through the Equinix communication channel infrastructure, accessed the cloud platforms of the above-mentioned major providers.

[Supercomputing 2019]. Multi-cloud storage as an area of ​​application for the new Kingston DC1000M drives

Slide from the presentation of Private MultiCloud Storage at the Supercomputing-2019 exhibition. Source: Kingston

Software of similar functionality for managing the architecture of private multicloud storage is offered by different companies. The terms for this architecture can also sound differently - Private MultiCloud Storage or Private Storage for Cloud.

“Today's supercomputers run a variety of HPC applications that are at the forefront of advancements, from oil and gas exploration to weather forecasting, financial markets and new technology development,” said Keith Schimmenti, manager of enterprise SSD management at Kingston. “These HPC applications require a much greater match between processor performance and I/O speed. We're proud to share how Kingston solutions are helping drive breakthroughs in computing, delivering the performance needed in the world's most extreme computing environments and applications.”

DC1000M drive and an example of a storage system based on it

The DC1000M U.2 NVMe SSD is designed by Kingston for the data center and is specifically designed for data-intensive and HPC applications such as artificial intelligence (AI) and machine learning (ML) applications.

[Supercomputing 2019]. Multi-cloud storage as an area of ​​application for the new Kingston DC1000M drives

DC1000M U.2 NVMe 3.84TB drive. Source: Kingston

DC1000M U.2 drives are based on 96-layer Intel 3D NAND memory, controlled by a Silicon Motion SM2270 controller (PCIe 3.0 and NVMe 3.0). The Silicon Motion SM2270 is a 16-lane enterprise NVMe controller with PCIe 3.0 x8 interface, dual 32-bit DRAM data bus and three ARM Cortex R5 dual processors.

DC1000M of different capacities are offered for release: from 0.96 to 7.68 TB (the most popular capacities are believed to be 3.84 and 7.68 TB). The drive's performance is estimated at 800 thousand IOPS.

[Supercomputing 2019]. Multi-cloud storage as an area of ​​application for the new Kingston DC1000M drives

Storage system with 10x DC1000M U.2 NVMe 7.68 TB. Source: Kingston

As an example of a storage system for HPC applications, Kingston presented at Supercomputing 2019 a rack solution with 10 DC1000M U.2 NVMe drives, each with a capacity of 7.68 TB. The storage system is based on the SB122A-PH, a 1U form factor platform from AIC. Processors: 2x Intel Xeon CPU E5-2660, Kingston DRAM 128 GB (8x16 GB) DDR4-2400 (Part Number: KSM24RS4/16HAI). The OS installed is Ubuntu 18.04.3 LTS, Linux kernel ver 5.0.0-31. The gfio v3.13 test (Flexible I/O tester) showed read performance of 5.8 million IOPS with a throughput of 23.8 Gbps.

The presented storage system showed impressive characteristics in terms of stable reading of 5,8 million IOPS (input-output operations per second). This is two orders of magnitude faster than SSDs for mass market systems. This read speed is needed for HPC applications running on specialized processors.

Cloud computing HPC with private storage in Russia

The task of performing high-performance computing at the provider, but physically storing on-premises data, is also relevant for Russian companies. Another common case in domestic business is when, when using foreign cloud services, data must be located on the territory of the Russian Federation. We asked for comment on these situations on behalf of the cloud provider Selectel as a long-time partner of Kingston.

“In Russia, it is possible to build a similar architecture, with service in Russian and all reporting documents for the client’s accounting department. If a company needs to carry out high-performance computing using on-premises storage systems, we at Selectel rent servers with processors of various types, including FPGA, GPU or multi-core CPUs. Additionally, through partners, we organize the laying of a dedicated optical channel between the client’s office and our data center,” comments Alexander Tugov, Director of Services Development at Selectel. — The client can also place his storage system on colocation in a computer room with a special access mode and run applications both on our servers and in the clouds of global providers AWS, MS Azure, Google Cloud. Of course, the signal delay in the latter case will be higher than if the client’s storage system was located in the USA, but a broadband multi-cloud connection will be provided.”

In the next article we will talk about another Kingston solution, which was presented at the Supercomputing 2019 exhibition (Denver, Colorado, USA) and is intended for machine learning applications and big data analysis using GPUs. This is GPUDirect Storage technology, which provides direct data transfer between NVMe storage and GPU processor memory. And in addition, we will explain how we managed to achieve a data read speed of 5.8 million IOPS in a rack storage system on NVMe disks.

For more information about Kingston Technology products, please visit The site of the company.

Source: habr.com

Add a comment