This article will compare backup tools, but first it's worth knowing how they quickly and well cope with restoring data from backups.
For ease of comparison, we will consider restoring from a full backup, especially since this mode of operation is supported by all candidates. For simplicity, the figures are taken already averaged (arithmetic mean of several runs). The results will be summarized in a table, which will also contain information about the possibilities: the presence of a web interface, ease of setup and operation, the ability to automate, the availability of various additional features (for example, data integrity checks), etc. The graphs will show the load on the server where the data will be used (not the backup server).
Восстановление данных
rsync and tar will be used as a reference point, because
Rsync handled the test data set in 4 minutes and 28 seconds, showing
such a load
The recovery process ran into a limitation of the disk subsystem of the backup storage server (sawtooth graphs). You can also clearly see the loading of one core without any problems (low iowait and softirq - no problems with the disk and network, respectively). Since the other two programs, namely rdiff-backup and rsnapshot, are based on rsync, and also offer regular rsync as a restore tool, they will have approximately the same load profile and backup restore time.
Tar made it a little faster
2 minutes and 43 seconds:
The full load of the system was higher on average by 20% due to the increased softirq - increased overhead during the operation of the network subsystem.
If the archive is further compressed, then the recovery time increases to 3 minutes 19 seconds from
such a load on the main server (unpacking on the side of the main server):
The unpacking process takes up both processor cores because there are two processes running. In general, the expected result. Also a comparable result (3 minutes and 20 seconds) was obtained when running gzip on the server side with backups, the load profile on the main server was quite similar to running tar without the gzip compressor (see previous graph).
В rdiff backup it is possible to sync the last backup made with regular rsync (results will be similar), but older backups still need to be restored with rdiff-backup, which managed to restore in 17 minutes and 17 seconds, showing
this load:
Perhaps this was intended, in any case, to limit the speed, the authors
snapshot for recovery, it suggests using regular rsync, so its results will be similar. In general, this is how it turned out.
Burp coped with the task of restoring a backup in 7 minutes and 2 seconds with
such a load:
It worked quite quickly, and at least it is much more convenient than pure rsync: no need to remember any flags, a simple and intuitive cli interface, built-in support for multiple copies, though twice as slow. If the data needs to be restored from the last backup made, you can use rsync, with a few caveats.
Approximately the same speed and load was shown by the program BackupPC when you enable rsync transfer mode by deploying the backup for
7 minutes and 42 seconds:
But in the data transfer mode with tar, BackupPC coped more slowly: in 12 minutes and 15 seconds, the processor load was generally lower
one and a half times:
Duplicity without encryption showed slightly better results, managing to restore the backup in 10 minutes and 58 seconds. If you activate encryption using gpg, the recovery time increases to 15 minutes and 3 seconds. Also, when creating a repository for storing copies, you can specify the size of the archive that will be used when splitting the incoming data stream. In general, on conventional hard drives, also due to the single-threaded mode of operation, there is not much difference. It will probably appear at different block sizes when using hybrid storage. The load on the main server during recovery was as follows:
without encryption
with encryption
duplicates showed a comparable recovery rate, managing in 13 minutes and 45 seconds. It took another 5 minutes to check the correctness of the recovered data (about 19 minutes in total). The load was
high enough:
When aes encryption was enabled internally, the recovery time was 21 minutes 40 seconds, with CPU usage at its highest (both cores!) at the time of recovery; when checking the data, only one thread was active, occupying one processor core. Checking the data after recovery took the same 5 minutes (almost 27 minutes in total).
Experience the Power of Effective Results
Duplicati coped with recovery a little faster when using the external program gpg for encryption, but in general, the differences from the previous mode are minimal. The running time was 16 minutes 30 seconds, with a data check of 6 minutes. The load was
such:
AMANDA, using tar, did it in 2 minutes 49 seconds, which, in principle, is very close to normal tar. Overall system load
the same:
When restoring a backup using zbackup got the following results:
encryption, lzma compression
Operating time 11 minutes and 8 seconds
aes encryption, lzma compression
14 running time minutes
aes encryption, lzo compression
Operating time 6 minutes, 19 seconds
Overall, not bad. Everything depends on the speed of the processor on the backup server, which is quite clearly seen from the time the program runs with different compressors. From the side of the backup server, a regular tar was launched, so if compared with it, the restoration is 3 times slower. It might be worth checking the operation in multi-threaded mode, with more than two threads.
Borg Backup in unencrypted mode, tar coped a little slower, in 2 minutes 45 seconds, however, unlike the same tar, it became possible to dedupe the repository. The load has been
next:
If you activate blake-based encryption, then the backup recovery speed slows down a bit. The recovery time in this mode is 3 minutes 19 seconds, and the load is out
like this:
aes encryption is slightly slower, recovery time is 3 minutes 23 seconds, the load is especially
hasn't changed:
Since Borg can work in multi-threaded mode, the processor load is maximum, while the activation of additional functions simply increases the operating time. Apparently, it is worth investigating the multithreading of work similar to zbackup.
Restic coped with the recovery a little slower, the operating time was 4 minutes 28 seconds. The load looked like
as follows:
Apparently, the restore process works in several threads, but the efficiency is not as high as that of BorgBackup, but comparable in time to regular rsync.
With urBackup it was possible to recover the data in 8 minutes and 19 seconds, while the load was
such:
You can still see a not very high load, even lower than that of tar. Bursts in places, but no more than one core load.
Selection and justification of criteria for comparison
As mentioned in one of the previous articles, the backup system must meet the following criteria:
- Ease of use
- Universalism
- Stability
- Rapidity
It is worth considering each item separately in more detail.
Ease of work
It is best when there is one button “Do everything well”, but if you return to real programs, it will be most convenient to have some familiar and standard operating principle.
Most users would probably be better off not having to remember a bunch of cli keys, set up a bunch of different, often obscure options via the web or tui, set up failure notifications. This also includes the ability to easily “fit” a backup solution into an existing infrastructure, as well as automating the backup process. There is also the possibility of installation with a package manager, or in one or two commands like “download and unpack”. curl ссылка | sudo bash
- a complicated method, since you need to check what flies by the link.
For example, among the candidates considered, a simple solution is burp, rdiff-backup and restic, which have mnemonically remembered keys for different modes of operation. Slightly more complex are borg and duplicity. The most difficult was AMANDA. The rest are somewhere in between in terms of ease of use. In any case, if you need more than 30 seconds to read the user manual, or you need to go to Google or another search engine, and scroll through a long sheet of help - the decision is complicated, one way or another.
Some of the considered candidates are able to automatically send a message via e-mailjabber, while others rely on configured alerts in the system. At the same time, most often complex solutions have not quite obvious notification settings. In any case, if the backup copy program issues a non-zero return code that will be correctly understood by the system service of periodic tasks (a message will fly to the system administrator or immediately into monitoring), the situation is simple. But if a backup system that does not work on a backup server cannot, without configuration, in an obvious way say about the problem, the complexity is already excessive. In any case, issuing warnings and other messages only to the web interface and/or the log is bad practice, as they will most often be ignored.
As for automation, a simple program can read environment variables that set its mode of operation, or it has a developed cli that can completely duplicate the behavior when working through a web interface, for example. This also includes the possibility of threading, the availability of expansion opportunities, etc.
Universalism
Partially echoes the previous subsection in terms of automation, it should not be a particular problem to "fit" the backup process into the existing infrastructure.
It is worth noting that the use of non-standard ports (well, except for the web interface) for work, the implementation of encryption in a non-standard way, the exchange of data using a non-standard protocol are signs of a non-universal solution. For the most part, all candidates have one in one way or another, for the obvious reason that simplicity and generality usually don't go together. As an exception - burp, there are others.
As a sign - the ability to work using regular ssh.
Speed
The most controversial and controversial point. On the one hand, we launched the process, it worked as quickly as possible and does not interfere with the main tasks. On the other hand, there is a surge in traffic and processor load during the backup. It is also worth noting that the fastest copying programs are usually the poorest in terms of functions that are important to users. Again: if in order to get one unfortunate text file several tens of bytes in size with a password, and because of it the whole service costs (yes, yes, I understand that the backup process is most often not guilty here), and you need to re-read sequentially all the files in the repository or expand the entire archive - the backup system is never fast. Another point that often becomes a stumbling block is the speed of deploying a backup from an archive. There is a clear advantage here for those who can simply copy or move files to the right place without any special manipulations (rsync for example), but most often the problem must be solved in an organizational way, empirically: measure the backup recovery time and openly inform users about it.
Stability
It should be understood as follows: on the one hand, it should be possible to deploy the backup back in any way, on the other hand, resistance to various problems: network failure, disk failure, deletion of part of the repository.
Comparison of backup tools
Copy time
Copy recovery time
Easy installation
Easy setup
Simple use
Simple Automation
Do you need a client server?
Checking the integrity of the repository
Differential copies
Work via pipe
Universalism
Independence
Repository Transparency
Encryption
Compression
Deduplication
Web interface
Cloud Shading
Windows support
mark
Rsync
4m15s
4m28s
Yes
no
no
no
Yes
no
no
Yes
no
Yes
Yes
no
no
no
no
no
Yes
6
Tar
pure
3m12s
2m43s
Yes
no
no
no
no
no
Yes
Yes
no
Yes
no
no
no
no
no
no
Yes
8,5
gzip
9m37s
3m19s
Yes
rdiff-backup
16m26s
17m17s
Yes
Yes
Yes
Yes
Yes
no
Yes
no
Yes
no
Yes
no
Yes
Yes
Yes
no
Yes
11
snapshot
4m19s
4m28s
Yes
Yes
Yes
Yes
no
no
Yes
no
Yes
no
Yes
no
no
Yes
Yes
no
Yes
12,5
Burp
11m9s
7m2s
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
Yes
no
no
Yes
no
Yes
no
Yes
10,5
Duplicity
no encryption
16m48s
10m58s
Yes
Yes
no
Yes
no
Yes
Yes
no
no
Yes
no
Yes
Yes
no
Yes
no
Yes
11
gpg
17m27s
15m3s
duplicates
no encryption
20m28s
13m45s
no
Yes
no
no
no
Yes
Yes
no
no
Yes
no
Yes
Yes
Yes
Yes
Yes
Yes
11
aes
29m41s
21m40s
gpg
26m19s
16m30s
zbackup
no encryption
40m3s
11m8s
Yes
Yes
no
no
no
Yes
Yes
Yes
no
Yes
no
Yes
Yes
Yes
no
no
no
10
aes
42m0s
14m1s
aes+lzo
18m9s
6m19s
Borg Backup
no encryption
4m7s
2m45s
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
Yes
no
Yes
16
aes
4m58s
3m23s
Blake2
4m39s
3m19s
Restic
5m38s
4m28s
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
no
Yes
no
Yes
Yes
15,5
urBackup
8m21s
8m19s
Yes
Yes
Yes
no
Yes
no
Yes
no
Yes
Yes
no
Yes
Yes
Yes
Yes
no
Yes
12
Amanda
9m3s
2m49s
Yes
no
no
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
Yes
Yes
13
BackupPC
Rsync
12m22s
7m42s
Yes
no
Yes
Yes
Yes
Yes
Yes
no
Yes
no
no
Yes
Yes
no
Yes
no
Yes
10,5
tar
12m34s
12m15s
Table legend:
- Green, running time is less than five minutes, or the answer is "Yes" (except for the "Need a client-server?" column), 1 point
- Yellow, operating time five to ten minutes, 0.5 points
- Red, running time more than ten minutes, or the answer is "No" (except for the column "Need a client-server?"), 0 points
According to the table above, BorgBackup is the simplest, fastest, and at the same time convenient and powerful backup tool. Second place was taken by Restic, the rest of the considered candidates placed approximately equally with a spread of one or two points at the end.
I thank everyone who has read the cycle to the end, I propose to discuss the options, offer your own, if any. As the discussion proceeds, the table may be supplemented.
The result of the series will be the final article, which will attempt to bring out the ideal, fast and manageable backup tool that allows you to deploy back a copy in the shortest possible time and at the same time be convenient and easy to set up and maintain.
Announcement
Backup Part 6: Comparing Backup Tools
Backup Part 7: Conclusions
Source: habr.com