Situation: virtual GPUs are not inferior in performance to iron solutions

In February, Stanford hosted a conference on High Performance Computing (HPC). VMware representatives said that when working with a GPU, a system based on a modified ESXi hypervisor is not inferior in speed to bare metal solutions.

We talk about the technologies that made it possible to achieve this.

Situation: virtual GPUs are not inferior in performance to iron solutions
/ photo Victorgrigas CC BY-SA

performance issue

According to analysts, about 70% of workloads in data centers virtualized. However, the remaining 30% are still running bare metal without hypervisors. Those 30% mostly consist of high-load applications, such as training neural networks, and using GPUs.

Experts explain this trend by the fact that the hypervisor as an intermediate layer of abstraction can affect the performance of the entire system. In a five-year-old study data can be found about reducing the speed of work by 10%. Therefore, companies and data center operators are in no hurry to transfer the HPC load to a virtual environment.

But virtualization technologies are developing and improving. At a conference a month ago, VMware said that the ESXi hypervisor does not adversely affect GPU performance. The calculation speed can be reduced by three percent, and this is comparable to bare metal.

How it works

To improve the performance of HPC systems with GPUs, VMware has made a number of changes to the hypervisor. In particular, he got rid of the vMotion function. It is needed for load balancing and usually transfers virtual machines (VMs) between servers or GPUs. Disabling vMotion resulted in each VM now being assigned a specific GPU. This helped to reduce the cost of data exchange.

Another key component of the system is the technology DirectPath I/O. It allows the CUDA driver for parallel computing to interact with virtual machines directly, bypassing the hypervisor. When it is required to run several VMs on one GPU at once, the GRID vGPU solution is used. It divides the card's memory into several segments (but the calculation cycles are not divided).

The scheme of operation of two virtual machines in this case will look like this:

Situation: virtual GPUs are not inferior in performance to iron solutions

Results and forecasts

Company conducted tests hypervisor by training a language model based on TensorFlow. The "damage" to performance was only 3-4% compared to bare metal. At the same time, in return, the system got the opportunity to allocate resources on demand, depending on the current loads.

IT giant also conducted tests with containers. The company's engineers trained neural networks to recognize images. At the same time, the resources of one GPU were distributed among four container VMs. As a result, the performance of individual machines decreased by 17% (compared to a single VM with full access to GPU resources). However, the number of images processed per second has increased three times. It is expected that such systems will find application in the field of data analysis and computer simulation.

Among the potential problems that VMware may face, experts emit rather narrow target audience. A small number of companies are currently working with high-performance systems. Although in Statista notethat by 2021, 94% of the world's data center workloads will be virtualized. By projections According to analysts, the value of the HPC market will grow from $32 billion to $45 billion between 2017 and 2022.

Situation: virtual GPUs are not inferior in performance to iron solutions
/ photo Global Access Point PD

Similar Solutions

There are several analogues on the market that are being developed by large IT companies: AMD and Intel.

First company for GPU virtualization offers approach based on SR-IOV (single-root input/output virtualization). This technology provides the VM with access to a portion of the system's hardware capabilities. The solution allows you to share the GPU between 16 users with equal performance of virtualized systems.

As for the second IT giant, their technology is based on the Citrix XenServer 7 hypervisor. It combines the work of a standard GPU driver and a virtual machine, which allows the latter to display 3D applications and desktops on the devices of hundreds of users.

Future technology

Virtual GPU Developers make a bet on the introduction of AI systems and the growing popularity of high-performance solutions in the business technology market. They hope that the need to process large amounts of data will increase the demand for vGPUs.

Now manufacturers looking for a way combine the functionality of the CPU and GPU in one core to speed up solving graphics-related tasks, performing mathematical calculations, logical operations, and data processing. The emergence of such cores in the market in the future will change the approach to virtualization of resources and their distribution between workloads in a virtual and cloud environment.

What to read on the topic in our corporate blog:

A couple of posts from our Telegram channel:

Source: habr.com

Add a comment