Backup Part 6: Comparing Backup Tools

Backup Part 6: Comparing Backup Tools
This article will compare backup tools, but first it's worth knowing how they quickly and well cope with restoring data from backups.
For ease of comparison, we will consider restoring from a full backup, especially since this mode of operation is supported by all candidates. For simplicity, the figures are taken already averaged (arithmetic mean of several runs). The results will be summarized in a table, which will also contain information about the possibilities: the presence of a web interface, ease of setup and operation, the ability to automate, the availability of various additional features (for example, data integrity checks), etc. The graphs will show the load on the server where the data will be used (not the backup server).

Восстановление данных

rsync and tar will be used as a reference point, because they are usually based on the simplest scripts for making backups.

Rsync handled the test data set in 4 minutes and 28 seconds, showing

such a loadBackup Part 6: Comparing Backup Tools

The recovery process ran into a limitation of the disk subsystem of the backup storage server (sawtooth graphs). You can also clearly see the loading of one core without any problems (low iowait and softirq - no problems with the disk and network, respectively). Since the other two programs, namely rdiff-backup and rsnapshot, are based on rsync, and also offer regular rsync as a restore tool, they will have approximately the same load profile and backup restore time.

Tar made it a little faster

2 minutes and 43 seconds:Backup Part 6: Comparing Backup Tools

The full load of the system was higher on average by 20% due to the increased softirq - increased overhead during the operation of the network subsystem.

If the archive is further compressed, then the recovery time increases to 3 minutes 19 seconds from
such a load on the main server (unpacking on the side of the main server):Backup Part 6: Comparing Backup Tools

The unpacking process takes up both processor cores because there are two processes running. In general, the expected result. Also a comparable result (3 minutes and 20 seconds) was obtained when running gzip on the server side with backups, the load profile on the main server was quite similar to running tar without the gzip compressor (see previous graph).

В rdiff backup it is possible to sync the last backup made with regular rsync (results will be similar), but older backups still need to be restored with rdiff-backup, which managed to restore in 17 minutes and 17 seconds, showing

this load:Backup Part 6: Comparing Backup Tools

Perhaps this was intended, in any case, to limit the speed, the authors offer a solution. The backup restore process itself takes a little less than half of one core, with proportionally comparable performance (i.e. 2-5 times slower) on disk and network with rsync.

snapshot for recovery, it suggests using regular rsync, so its results will be similar. In general, this is how it turned out.

Burp coped with the task of restoring a backup in 7 minutes and 2 seconds with
such a load:Backup Part 6: Comparing Backup Tools

It worked quite quickly, and at least it is much more convenient than pure rsync: no need to remember any flags, a simple and intuitive cli interface, built-in support for multiple copies, though twice as slow. If the data needs to be restored from the last backup made, you can use rsync, with a few caveats.

Approximately the same speed and load was shown by the program BackupPC when you enable rsync transfer mode by deploying the backup for

7 minutes and 42 seconds:Backup Part 6: Comparing Backup Tools

But in the data transfer mode with tar, BackupPC coped more slowly: in 12 minutes and 15 seconds, the processor load was generally lower

one and a half times:Backup Part 6: Comparing Backup Tools

Duplicity without encryption showed slightly better results, managing to restore the backup in 10 minutes and 58 seconds. If you activate encryption using gpg, the recovery time increases to 15 minutes and 3 seconds. Also, when creating a repository for storing copies, you can specify the size of the archive that will be used when splitting the incoming data stream. In general, on conventional hard drives, also due to the single-threaded mode of operation, there is not much difference. It will probably appear at different block sizes when using hybrid storage. The load on the main server during recovery was as follows:

without encryptionBackup Part 6: Comparing Backup Tools

with encryptionBackup Part 6: Comparing Backup Tools

duplicates showed a comparable recovery rate, managing in 13 minutes and 45 seconds. It took another 5 minutes to check the correctness of the recovered data (about 19 minutes in total). The load was

high enough:Backup Part 6: Comparing Backup Tools

When aes encryption was enabled internally, the recovery time was 21 minutes 40 seconds, with CPU usage at its highest (both cores!) at the time of recovery; when checking the data, only one thread was active, occupying one processor core. Checking the data after recovery took the same 5 minutes (almost 27 minutes in total).

Experience the Power of Effective ResultsBackup Part 6: Comparing Backup Tools

Duplicati coped with recovery a little faster when using the external program gpg for encryption, but in general, the differences from the previous mode are minimal. The running time was 16 minutes 30 seconds, with a data check of 6 minutes. The load was

such:Backup Part 6: Comparing Backup Tools

AMANDA, using tar, did it in 2 minutes 49 seconds, which, in principle, is very close to normal tar. Overall system load

the same:Backup Part 6: Comparing Backup Tools

When restoring a backup using zbackup got the following results:

encryption, lzma compressionBackup Part 6: Comparing Backup Tools

Operating time 11 minutes and 8 seconds

aes encryption, lzma compressionBackup Part 6: Comparing Backup Tools

14 running time minutes

aes encryption, lzo compressionBackup Part 6: Comparing Backup Tools

Operating time 6 minutes, 19 seconds

Overall, not bad. Everything depends on the speed of the processor on the backup server, which is quite clearly seen from the time the program runs with different compressors. From the side of the backup server, a regular tar was launched, so if compared with it, the restoration is 3 times slower. It might be worth checking the operation in multi-threaded mode, with more than two threads.

Borg Backup in unencrypted mode, tar coped a little slower, in 2 minutes 45 seconds, however, unlike the same tar, it became possible to dedupe the repository. The load has been

next:Backup Part 6: Comparing Backup Tools

If you activate blake-based encryption, then the backup recovery speed slows down a bit. The recovery time in this mode is 3 minutes 19 seconds, and the load is out

like this:Backup Part 6: Comparing Backup Tools

aes encryption is slightly slower, recovery time is 3 minutes 23 seconds, the load is especially

hasn't changed:Backup Part 6: Comparing Backup Tools

Since Borg can work in multi-threaded mode, the processor load is maximum, while the activation of additional functions simply increases the operating time. Apparently, it is worth investigating the multithreading of work similar to zbackup.

Restic coped with the recovery a little slower, the operating time was 4 minutes 28 seconds. The load looked like

as follows:Backup Part 6: Comparing Backup Tools

Apparently, the restore process works in several threads, but the efficiency is not as high as that of BorgBackup, but comparable in time to regular rsync.

With urBackup it was possible to recover the data in 8 minutes and 19 seconds, while the load was

such:Backup Part 6: Comparing Backup Tools

You can still see a not very high load, even lower than that of tar. Bursts in places, but no more than one core load.

Selection and justification of criteria for comparison

As mentioned in one of the previous articles, the backup system must meet the following criteria:

  • Ease of use
  • Universalism
  • Stability
  • Rapidity

It is worth considering each item separately in more detail.

Ease of work

It is best when there is one button “Do everything well”, but if you return to real programs, it will be most convenient to have some familiar and standard operating principle.
Most users would probably be better off not having to remember a bunch of cli keys, set up a bunch of different, often obscure options via the web or tui, set up failure notifications. This also includes the ability to easily “fit” a backup solution into an existing infrastructure, as well as automating the backup process. There is also the possibility of installation with a package manager, or in one or two commands like “download and unpack”. curl ссылка | sudo bash - a complicated method, since you need to check what flies by the link.

For example, among the candidates considered, a simple solution is burp, rdiff-backup and restic, which have mnemonically remembered keys for different modes of operation. Slightly more complex are borg and duplicity. The most difficult was AMANDA. The rest are somewhere in between in terms of ease of use. In any case, if you need more than 30 seconds to read the user manual, or you need to go to Google or another search engine, and scroll through a long sheet of help - the decision is complicated, one way or another.

Some of the considered candidates are able to automatically send a message via e-mailjabber, while others rely on configured alerts in the system. At the same time, most often complex solutions have not quite obvious notification settings. In any case, if the backup copy program issues a non-zero return code that will be correctly understood by the system service of periodic tasks (a message will fly to the system administrator or immediately into monitoring), the situation is simple. But if a backup system that does not work on a backup server cannot, without configuration, in an obvious way say about the problem, the complexity is already excessive. In any case, issuing warnings and other messages only to the web interface and/or the log is bad practice, as they will most often be ignored.

As for automation, a simple program can read environment variables that set its mode of operation, or it has a developed cli that can completely duplicate the behavior when working through a web interface, for example. This also includes the possibility of threading, the availability of expansion opportunities, etc.

Universalism

Partially echoes the previous subsection in terms of automation, it should not be a particular problem to "fit" the backup process into the existing infrastructure.
It is worth noting that the use of non-standard ports (well, except for the web interface) for work, the implementation of encryption in a non-standard way, the exchange of data using a non-standard protocol are signs of a non-universal solution. For the most part, all candidates have one in one way or another, for the obvious reason that simplicity and generality usually don't go together. As an exception - burp, there are others.

As a sign - the ability to work using regular ssh.

Speed

The most controversial and controversial point. On the one hand, we launched the process, it worked as quickly as possible and does not interfere with the main tasks. On the other hand, there is a surge in traffic and processor load during the backup. It is also worth noting that the fastest copying programs are usually the poorest in terms of functions that are important to users. Again: if in order to get one unfortunate text file several tens of bytes in size with a password, and because of it the whole service costs (yes, yes, I understand that the backup process is most often not guilty here), and you need to re-read sequentially all the files in the repository or expand the entire archive - the backup system is never fast. Another point that often becomes a stumbling block is the speed of deploying a backup from an archive. There is a clear advantage here for those who can simply copy or move files to the right place without any special manipulations (rsync for example), but most often the problem must be solved in an organizational way, empirically: measure the backup recovery time and openly inform users about it.

Stability

It should be understood as follows: on the one hand, it should be possible to deploy the backup back in any way, on the other hand, resistance to various problems: network failure, disk failure, deletion of part of the repository.

Comparison of backup tools

Copy time
Copy recovery time
Easy installation
Easy setup
Simple use
Simple Automation
Do you need a client server?
Checking the integrity of the repository
Differential copies
Work via pipe
Universalism
Independence
Repository Transparency
Encryption
Compression
Deduplication
Web interface
Cloud Shading
Windows support
mark

Rsync
4m15s
4m28s
Yes
no
no
no
Yes
no
no
Yes
no
Yes
Yes
no
no
no
no
no
Yes
6

Tar
pure
3m12s
2m43s
Yes
no
no
no
no
no
Yes
Yes
no
Yes
no
no
no
no
no
no
Yes
8,5

gzip
9m37s
3m19s
Yes

rdiff-backup
16m26s
17m17s
Yes
Yes
Yes
Yes
Yes
no
Yes
no
Yes
no
Yes
no
Yes
Yes
Yes
no
Yes
11

snapshot
4m19s
4m28s
Yes
Yes
Yes
Yes
no
no
Yes
no
Yes
no
Yes
no
no
Yes
Yes
no
Yes
12,5

Burp
11m9s
7m2s
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
Yes
no
no
Yes
no
Yes
no
Yes
10,5

Duplicity
no encryption
16m48s
10m58s
Yes
Yes
no
Yes
no
Yes
Yes
no
no
Yes
no
Yes
Yes
no
Yes
no
Yes
11

gpg
17m27s
15m3s

duplicates
no encryption
20m28s
13m45s
no
Yes
no
no
no
Yes
Yes
no
no
Yes
no
Yes
Yes
Yes
Yes
Yes
Yes
11

aes
29m41s
21m40s

gpg
26m19s
16m30s

zbackup
no encryption
40m3s
11m8s
Yes
Yes
no
no
no
Yes
Yes
Yes
no
Yes
no
Yes
Yes
Yes
no
no
no
10

aes
42m0s
14m1s

aes+lzo
18m9s
6m19s

Borg Backup
no encryption
4m7s
2m45s
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
Yes
no
Yes
16

aes
4m58s
3m23s

Blake2
4m39s
3m19s

Restic
5m38s
4m28s
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
no
Yes
no
Yes
Yes
15,5

urBackup
8m21s
8m19s
Yes
Yes
Yes
no
Yes
no
Yes
no
Yes
Yes
no
Yes
Yes
Yes
Yes
no
Yes
12

Amanda
9m3s
2m49s
Yes
no
no
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
13

BackupPC
Rsync
12m22s
7m42s
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
no
no
Yes
Yes
no
Yes
no
Yes
10,5

tar
12m34s
12m15s

Table legend:

  • Green, running time is less than five minutes, or the answer is "Yes" (except for the "Need a client-server?" column), 1 point
  • Yellow, operating time five to ten minutes, 0.5 points
  • Red, running time more than ten minutes, or the answer is "No" (except for the column "Need a client-server?"), 0 points

According to the table above, BorgBackup is the simplest, fastest, and at the same time convenient and powerful backup tool. Second place was taken by Restic, the rest of the considered candidates placed approximately equally with a spread of one or two points at the end.

I thank everyone who has read the cycle to the end, I propose to discuss the options, offer your own, if any. As the discussion proceeds, the table may be supplemented.

The result of the series will be the final article, which will attempt to bring out the ideal, fast and manageable backup tool that allows you to deploy back a copy in the shortest possible time and at the same time be convenient and easy to set up and maintain.

Announcement

Backup, part 1: Why backup is needed, an overview of methods, technologies
Backup Part 2: Reviewing and testing rsync-based backup tools
Backup Part 3: Review and testing of duplicity, duplicati
Backup Part 4: Reviewing and testing zbackup, restic, borgbackup
Backup Part 5: Testing bacula and veeam backup for linux
Backup Part 6: Comparing Backup Tools
Backup Part 7: Conclusions

Source: habr.com

Add a comment