High Speed ​​Failsafe Compression (Continued)

This article is already the second in the topic of high-speed data compression. The first article described a compressor running at 10GB/sec. per processor core (minimum compression, RTT-Min).

This compressor has already been implemented in the equipment of forensic duplicators for high-speed compression of storage media dumps and increased cryptographic strength, it can also be used to compress virtual machine images and RAM swap files while storing them on high-speed SSD drives.

The first article also announced the development of a compression algorithm for compressing backup copies of HDD and SSD disk drives (medium compression, RTT-Mid) with significantly improved data compression parameters. By now, this compressor is completely ready and this article is about it.

The compressor that implements the RTT-Mid algorithm provides a compression ratio comparable to standard archivers such as WinRar, 7-Zip, operating in high-speed mode. At the same time, the speed of its operation is at least an order of magnitude higher.

The speed of packing/unpacking data is a critical parameter that determines the scope of compression technologies. It would hardly occur to anyone to compress a terabyte of data at a speed of 10-15 Megabytes per second (this is exactly the speed of archivers in the standard compression mode), because this will take almost twenty hours with a full processor load ...

On the other hand, the same terabyte can be copied at speeds of the order of 2-3 Gigabytes per second in about ten minutes.

Therefore, the compression of information of a large volume is relevant if it is performed at a speed not lower than the speed of real I / O. For modern systems, this is at least 100 Megabytes per second.

Modern compressors can produce such speeds only in the “fast” mode. It is in this actual mode that we will compare the RTT-Mid algorithm with traditional compressors.

Comparative testing of the new compression algorithm

The RTT-Mid compressor operated as part of a test program. In a real "working" application, it works much faster, it uses multithreading correctly and uses a "normal" compiler, not C#.

Since the compressors used in the comparative test are built on different principles and different types of data are compressed in different ways, for the objectivity of the test, the method of measuring the “average temperature in the hospital” was used ...

A sector-by-sector dump file of a logical disk with the Windows 10 operating system was created, this is the most natural mixture of various data structures actually available on every computer. Compressing this file will make it possible to compare the speed and degree of compression of the new algorithm with the most advanced compressors used in modern archivers.

Here is the dump file:

High Speed ​​Failsafe Compression (Continued)

The dump file was compressed with PTT-Mid, 7-zip, WinRar compressors. The WinRar compressor and 7-zip were set to maximum speed.

Compressor running 7-zip:

High Speed ​​Failsafe Compression (Continued)

It loads the processor by 100%, while the average read speed of the initial dump is about 60 MB/sec.

Compressor running WinRAR:

High Speed ​​Failsafe Compression (Continued)

The situation is similar, the processor load is almost 100%, the average dump reading speed is about 125 MB/sec.

As in the previous case, the speed of the archiver is limited by the capabilities of the processor.

The compressor test program is now running RTT Mid:

High Speed ​​Failsafe Compression (Continued)

The screenshot shows that the processor is loaded at 50% and is idle the rest of the time, because there is nowhere to upload the compressed data. The data dump disk (Disk 0) is almost completely loaded. The speed of reading data (Disk 1) jumps a lot, but on average more than 200 megabytes / sec.

The speed of the compressor is limited in this case by the ability to write compressed data to Disk 0.

Now the compression ratio of the resulting archives:

High Speed ​​Failsafe Compression (Continued)

High Speed ​​Failsafe Compression (Continued)

High Speed ​​Failsafe Compression (Continued)

It can be seen that the RTT-Mid compressor coped best with compression, the archive created by it is 1,3 GB less than the WinRar archive and 2,1 GB less than the 7z archive.

Time taken to create the archive:

  • 7-zip - 26 minutes 10 seconds;
  • WinRar - 17 minutes 40 seconds;
  • RTT-Mid - 7 minutes 30 seconds.

Thus, even a test, not optimized program, using the RTT-Mid algorithm, was able to create an archive more than two and a half times faster, while the archive turned out to be significantly smaller than that of competitors ...

Those who do not believe the screenshots can check their authenticity for themselves. The test program is available at link, download and check.

But only on processors with AVX-2 support, without support for these instructions, the compressor does not work, and do not test the algorithm on old AMD processors, they are slow in terms of executing AVX commands ...

Compression method used

The algorithm uses the method of indexing repeated fragments of text in byte granulation. This compression method has been known for a long time, but was not used, since the operation of finding matches was very costly in terms of the necessary resources and required much more time than building a dictionary. So the RTT-Mid algorithm is a classic example of “back to the future” movement…

The PTT compressor uses a unique high-speed matcher scanner to speed up the compression process. A self-made scanner, this is “my charm ...”, “the price is considerable, because it is completely handmade” (written in assembler).

The match search scanner is made according to a two-level probabilistic scheme, first the presence of a "sign" of a match is scanned, and only after the "sign" is detected in this place, the procedure for detecting a real match is started.

The matching window has an unpredictable size, depending on the degree of entropy in the data block being processed. For completely random (incompressible) data, it has a size of megabytes; for data with repetitions, it always has a size of more than a megabyte.

But many modern data formats are incompressible and it is useless and wasteful to “drive” a resource-intensive scanner through them, so the scanner uses two modes of operation. First, sections of the source text with possible repetitions are searched, this operation is also carried out by a probabilistic method and is performed very quickly (at a speed of 4-6 Gigabytes / sec). Then areas with possible matches are processed by the main scanner.

Index compression is not very efficient, you have to replace repeating fragments with indexes, and the index array significantly reduces the compression ratio.

To increase the compression ratio, not only full matches of byte strings are indexed, but also partial ones, when there are matched and unmatched bytes in the string. To do this, the index format includes a match mask field that indicates the matched bytes of two blocks. For even greater compression, indexing is used with the imposition of several overlapping blocks on the current block.

All this made it possible to obtain a compression ratio in the PTT-Mid compressor comparable to compressors made according to the dictionary method, but working much faster.

The speed of the new compression algorithm

If the compressor works with the exclusive use of cache memory (4 MB per stream is required), then the speed of operation varies in the range of 700-2000 MB/sec. per processor core, depending on the type of compressed data and little depends on the operating frequency of the processor.

With a multi-threaded implementation of the compressor, effective scalability is determined by the amount of cache memory in the third level. For example, having 9 MB of cache memory "on board", it makes no sense to run more than two compression streams, the speed will not increase from this. But with a cache of 20 MB, you can already run five compression streams.

Also, the latency of RAM becomes an essential parameter that determines the speed of the compressor. The algorithm uses random accesses to the OP, some of which do not get into the cache memory (about 10%) and it has to idle, waiting for data from the OP, which reduces the speed.

Significantly affects the speed of the compressor and the operation of the I / O system. I/O requests to the RAM block CPU requests for data, which also slows down the compression rate. This problem is significant for laptops and desktops, for servers it is less significant due to a more advanced system bus access control unit and multi-channel RAM.

Everywhere in the text in the article we are talking about compression, decompression remains outside the scope of this article because there is “everything in chocolate”. Decompression is much faster and is limited by I/O speed. One physical core in one thread quietly provides unpacking speeds at the level of 3-4 Gigabytes / sec.

This is due to the absence of a match search operation during the unpacking process, which “gobbles up” the main resources of the processor and cache memory during compression.

Compressed Data Storage Reliability

As follows from the name of the entire class of software tools using data compression (archivers), they are designed for long-term storage of information, not for years, but for centuries and millennia…

During storage, storage media lose some of the data, here is an example:

High Speed ​​Failsafe Compression (Continued)

This “analogue” information carrier is a thousand years old, some fragments are lost, but in general the information is “readable” ...

None of the responsible manufacturers of modern digital data storage systems and digital media for them guarantees the complete safety of data for more than 75 years.
And this is a problem, but the problem is postponed, our descendants will solve it ...

Digital data storage systems can lose data not only after 75 years, data errors can appear at any time, even during their recording, they try to minimize these distortions using redundancy and correcting error correction systems. Redundancy and correction systems can not always recover lost information, and even if they do, there is no guarantee that the recovery operation was correct.

And this is also a big problem, but not delayed, but current.

Modern compressors used for archiving digital data are built on various modifications of the dictionary method and for such archives the loss of a piece of information will be a fatal event, there is even an established term for such a situation - a “broken” archive ...

The low reliability of information storage in archives with dictionary compression is related to the structure of the compressed data. The information in such an archive does not contain the original text, the numbers of entries in the dictionary are stored there, and the dictionary itself is dynamically modified by the current compressible text. If a fragment of the archive is lost or distorted, all subsequent records of the archive cannot be identified either by the content or by the length of the entry in the dictionary, since it is not clear what the number of the dictionary entry corresponds to.

It is impossible to restore information from such a "broken" archive.

The RTT algorithm is built around a more robust method for storing compressed data. It uses the index method of accounting for repeating fragments. This approach to compression allows minimizing the consequences of information distortion on the media, and in many cases automatically correcting distortions that occurred during storage of information.
This is due to the fact that the archive file in the case of index compression contains two fields:

  • the field of the original text with sections of repetition removed from it;
  • index field.

The index field, which is critical for information recovery, is not large in size and can be duplicated for data storage reliability. Therefore, even if a fragment of the source text or an index array is lost, then all other information will be restored without problems, as in the picture with an "analog" storage medium.

Algorithm Disadvantages

There are no advantages without disadvantages. The index compression method does not compress small repeating sequences. This is due to limitations of the index method. Indexes are at least 3 bytes in size and can be up to 12 bytes in size. If there is a repetition with a smaller size than the index describing it, then it is not taken into account, no matter how often such repetitions are found in the compressed file.

The traditional dictionary compression method effectively compresses multiple repetitions of small length and therefore achieves a higher compression ratio than index compression. True, this is achieved due to the high load on the central processor, in order for the dictionary method to start compressing data more efficiently than the index method, it has to reduce the data processing speed to 10-20 megabytes per second on real computing installations when the CPU is fully loaded.

Such low speeds are unacceptable for modern data storage systems and are of more "academic" interest than practical.

The degree of information compression will be significantly increased in the next modification of the RTT algorithm (RTT-Max), it is already under development.

So, as always, to be continued...

Source: habr.com

Add a comment