MIT's New CPU Load Balancer

The Shenango system is planned to be used in data centers.

MIT's New CPU Load Balancer
/ photo Marco verch CC BY

According to one of the providers, data centers use only 20-40% of available computing power. At high loads, this indicator can reach 60%. This distribution of resources leads to the emergence of so-called "zombie servers". These are machines that sit idle most of the time, wasting electricity. Today 30% of servers in the world are out of work, consuming $30 billion a year in electricity.

MIT decided to fight the inefficient use of computing resources.

Engineering team developed a processor load balancing system called Shenango. Its purpose is to monitor the state of the task buffer and redistribute "stuck" processes (which cannot get processor time) to free machines.

How Shenango works

Shenango is a Linux C library with Rust and C++ bindings. The project code and test applications are published in repositories on GitHub.

The solution is based on the IOKernel algorithm, which runs on a dedicated core of a multiprocessor system. It manages requests to the CPU using a framework DPDK, which allows applications to communicate directly with network devices.

The IOKernel decides which cores to hand off a particular task to. The algorithm also decides how many cores are needed. For each process, the main cores (guaranteed) and additional (burstable) are defined - the latter are launched in case of a sharp increase in the number of requests to the CPU.

The IOKernel request queue is organized as ring buffer. Every five microseconds, the algorithm checks to see if all the tasks assigned to the kernel have been completed. To do this, it compares the current location of the "head" of the buffer with the previous position of its "tail". If it turns out that the tail was already in the queue at the time of the previous check, the system notes the buffer overload and allocates an additional core for the process.

Load balancing prioritizes cores on which the same process has run before and is partially cached, or any idle cores.

MIT's New CPU Load Balancer

Shenango additionally uses the approach work stealing. The cores allocated for the operation of one application monitor the number of tasks each other has. If one core finishes its list of tasks before the others, then it "removes" part of the load from its neighbors.

Advantages and disadvantages

On words engineers at MIT, Shenango is capable of processing five million requests per second and maintaining an average response time of 37 microseconds. Experts say that in some cases the technology is able to increase the utilization of processors in data centers up to 100%. As a result, data center operators will be able to save on the purchase and maintenance of servers.

Solution Potential note and specialists from other universities. According to a professor from the Korean Institute, the MIT system will help reduce delays in web services. For example, it is useful in the work of online stores. On sale days, even a second page load delay ΠΏΡ€ΠΈΠ²ΠΎΠ΄ΠΈΡ‚ to reduce the number of site views by 11%. Rapid load distribution will help serve more customers.

The technology still has drawbacks - it does not support multiprocessor NUMA-systems in which the chips are connected to different memory modules and do not "communicate" with each other. In this case, IOKernel can regulate the operation of a particular group of processors, but not all server chips.

MIT's New CPU Load Balancer
/ photo Tim Reckmann CC BY

Similar technologies

Among other load balancing systems for processors, Arachne can be distinguished. It calculates how many cores the application will need at the time of its launch, and distributes processes in accordance with this indicator. According to the authors, the maximum delay of the application in Arachne is about 10 thousand microseconds.

The technology is implemented as a C++ library for Linux, and its source code is available at GitHub.

Another balancing tool is ZygOS. Like Shenango, the technology uses the work stealing method to redistribute processes. According to the authors of ZygOS, the average application latency when using the tool is about 150 microseconds, and the maximum is about 450 microseconds. Project code also is in the public domain.

Conclusions

Modern data centers continue to expand, especially the upward trend is noticeable in the hyperscale data center market: now in the world exist 430 hyperscale data centers, but in the coming years their number may increase by 30%. For this reason, processor load balancing technologies will be in great demand. Systems like Shenango already now implement large corporations, and in the future the number of such tools will only grow.

Posts from the First Enterprise IaaS Blog:

Source: habr.com

Add a comment