Backup, part 1: Purpose, overview of methods and technologies

Backup, part 1: Purpose, overview of methods and technologies
Why do you need to make backups? After all, the equipment is very, very reliable, and besides, there are “clouds” that are better than physical servers in terms of reliability: with the right configuration, the “cloud” server will easily survive the failure of the infrastructure physical server, and from the point of view of service users, there will be a small, barely noticeable jump in time service. In addition, duplication of information often requires paying for "extra" processor time, disk load, network traffic.

The ideal program runs fast, doesn't leak through RAM, has no holes, and doesn't exist.

-Unknown

Since programs are still written by protein developers, and the testing process is often absent, plus the delivery of programs very rarely occurs using “best practices” (which themselves are also programs, and therefore not ideal), system administrators most often have to solve problems that sound briefly but capaciously: “return as it was”, “bring the base to normal operation”, “works slowly - roll back”, as well as my favorite “I don’t know what, but fix it”.

In addition to logical errors that come out as a result of the careless work of developers, or a combination of circumstances, as well as incomplete knowledge or misunderstanding of small features of building programs - including binders and system ones, including operating systems, drivers and firmware - there are also other errors. For example, most developers rely on runtime, completely forgetting about physical laws, which are still impossible to bypass with the help of programs. These are the infinite reliability of the disk subsystem and, in general, any data storage subsystem (including RAM and processor cache!), and zero processing time on the processor, and the absence of errors during network transmission and processing on the processor, and network latency, which are equal to 0. Do not neglect the notorious deadline, because if you don’t meet it in time, there will be problems cleaner than the nuances of network and disk operation.

Backup, part 1: Purpose, overview of methods and technologies

What about the problems that rise to their full height and hang over valuable data? There is nothing to replace living developers, and it’s not a fact that it will be possible in the near future. On the other hand, to fully prove that the program will work as intended, so far only a few projects have succeeded, and it will not necessarily be possible to take and apply the evidence to other, similar projects. Also, such proofs take a lot of time, and require special skills and knowledge, and this practically minimizes the possibility of their application, taking into account deadlines. In addition, we still do not know how to use ultra-fast, cheap and infinitely reliable technology for storing, processing and transmitting information. Such technologies, if they exist, are in the form of concepts, or - most often - only in science fiction books and films.

Good artists copy, great artists steal.

—Pablo Picasso.

The most successful solutions and surprisingly simple things usually occur where there are absolutely incompatible, at first glance, concepts, technologies, knowledge, fields of science.

For example, birds and airplanes have wings, but despite the functional similarity - the principle of operation in some modes is the same, and technical problems are solved in a similar way: hollow bones, the use of strong and lightweight materials, etc. - the results are completely different, although very similar. The best examples that we see in our technology are also mostly borrowed from nature: sealed compartments in ships and submarines are a direct analogy with annelids; building raid arrays and checking data integrity - duplicating the DNA chain; as well as paired organs, independence of the work of different organs from the central nervous system (automaticity of the heart) and reflexes - autonomous systems on the Internet. Of course, taking and applying ready-made solutions "on the forehead" is fraught with problems, but who knows, maybe there are no other solutions.

If you knew where you would fall, you would spread straws!

—Belarusian folk proverb

This means that backups are vital for those who wish to:

  • Be able to get your systems back up and running with little or no downtime
  • Feel free to act, because in case of an error there is always the possibility of a rollback
  • Minimize the consequences of intentional data corruption

Here is some theory

Any classification is arbitrary. Nature does not classify. We classify because it is more convenient for us. And we classify according to the data that we also take arbitrarily.

—Jean Brüler

Regardless of the physical storage method, logical data storage can be conditionally divided into 2 ways of accessing this data: block and file. Such a division has been very blurry lately, because purely block, as well as purely file, logical storages do not exist. However, for simplicity, we will assume that they exist.

Block data storage implies that there is a physical device where data is written in some fixed portions, blocks. Access to the blocks goes to a certain address, each block has its own address within the device.

A backup is usually done by copying blocks of data. To ensure the integrity of the data at the time of copying, the recording of new blocks, as well as the modification of existing ones, is suspended. If we take an analogy from the ordinary world, the closest is a cabinet with the same numbered cells.

Backup, part 1: Purpose, overview of methods and technologies

File storage of data according to the principle of a logical device is close to block storage and is often organized on top. Important differences are the presence of a storage hierarchy and human-readable names. An abstraction is distinguished in the form of a file - a named data area, as well as a directory - a special file that stores descriptions and accesses to other files. Files can be supplied with additional metadata: creation time, access flags, etc. They usually back up like this: they look for changed files, then they copy them to another file storage of the same structure. Data integrity is usually implemented by the absence of files being written to. File metadata is backed up in a similar way. The closest analogy is a library, which has sections with different books, and also has a catalog with human-readable names of books.

Backup, part 1: Purpose, overview of methods and technologies

Recently, another option has sometimes been described, from which, in principle, file data storage began, and which has the same archaic features: object data storage.

It differs from file storage in that it does not have more than one nesting (flat scheme), and although the file names are human-readable, they are still more suitable for processing by machines. When backing up, object stores are most often treated like file stores, but occasionally there are other options.

— There are two kinds of system administrators, those who do not make backups, and those who ALREADY do.
- Actually, there are three types: there are also those who check that backups can be restored.

-Unknown

It is also worth understanding that the process of backing up data is carried out by programs, so it has all the same disadvantages as any other program. In order to remove (not exclude!) Dependence on the human factor, as well as features - which individually do not greatly affect, but together can give a tangible effect - they use the so-called. 3-2-1 rule. There are many options for how to decrypt it, but I prefer the following interpretation: you need to store 3 sets of the same data, 2 sets must be stored in different formats, and 1 set must be stored on a geographically remote storage.

Storage format means the following:

  • If there is a dependence on the physical storage method, we change the physical storage method.
  • If there is a dependence on the logical storage method, we change the logical method.

To achieve the maximum effect of the 3-2-1 rule, it is recommended to change the storage format in both ways.

From the point of view of the readiness of a backup for its intended purpose - restoring performance, a distinction is made between "hot" and "cold" backups. Hot ones differ from cold ones in only one way: they are immediately ready for use, while cold ones require some additional steps for recovery: decryption, extraction from the archive, etc.

Do not confuse hot and cold copies with online and offline copies, which imply physical isolation of data, and in fact, are another sign of the classification of backup methods. So an offline copy - not directly connected to the system where it needs to be restored - can be either hot or cold (in terms of readiness for recovery). An online copy can be accessed directly where it needs to be restored, and is most often hot, but there are also cold ones.

In addition, do not forget that the process of creating backups usually does not end with the creation of one backup, and there can be quite a large number of copies. Therefore, it is necessary to distinguish between full backups, i.e. those that are recoverable independently of other backups, as well as differential (incremental, differential, decremental, etc.) backups - those that cannot be restored independently and require the preliminary restoration of one or more other backups.

Differential incremental backups are an attempt to save space for storing backups. Thus, only changed data from the previous backup is written to the backup.

Differential decremental ones are created for the same purpose, but in a slightly different way: a full backup is made, but only the difference between the fresh copy and the previous one is actually stored.

Separately, it is worth considering the backup process over storage, which supports the absence of duplicate storage. Thus, if you write full backups on top of it, only the difference between backups will actually be written, but the process of restoring backups will be similar to restoring from a full copy and completely transparent.

Quis custodiaet ipsos custodes?

(Who will guard the watchmen themselves? - Lat.)

It is very frustrating when there are no backups, but it is much worse if the backup seems to be made, but when restoring it turns out that it cannot be restored, because:

  • The integrity of the original data has been violated.
  • The backup storage is corrupted.
  • Restoration works very slowly, you cannot use data that is partially restored.

A well-designed backup process must take into account such remarks, especially the first two.

The integrity of the original data can be guaranteed in several ways. The following are most commonly used: a) creating snapshots of the file system at the block level, b) “freezing” the state of the file system, c) a special block device with versioning, d) sequential writing of files or blocks. Checksums are also applied to ensure data verification during recovery.

Storage corruption can also be detected using checksums. An additional method is the use of specialized devices or file systems in which you cannot change the already recorded data, but you can add new ones.

To speed up recovery, data recovery with multiple recovery processes is used - provided that there is no "bottleneck" in the form of a slow network or a slow disk system. In order to get around the situation with partially restored data, you can break the backup process into relatively small subtasks, each of which is performed separately. Thus, it becomes possible to consistently restore performance with predictive recovery time. This problem most often lies in the organizational plane (SLA), so we will not dwell on this in detail.

He knows a lot about spices not the one who adds them to every dish, but the one who never adds anything superfluous to it.

-IN. Sinyavsky

The practice of system administrators regarding the software used may vary, but the general principles are still the same, one way or another, in particular:

  • It is strongly recommended to use ready-made solutions.
  • Programs should work predictably, i.e. there should be no undocumented features or bottlenecks.
  • The setup of each program should be simple enough so that you do not have to read the manual or cheat sheet every time.
  • The solution, if possible, should be universal, because servers in terms of their hardware characteristics can vary quite a lot.

To take backups from block devices, there are the following common programs:

  • dd, familiar to veterans of system administration, this also includes similar programs (the same dd_rescue, for example).
  • Utility programs (utilities) built into some file systems that create a snapshot (dump) of the file system.
  • Omnivorous utilities; e.g. partclone.
  • Own, often proprietary, decisions; for example, NortonGhost and later.

For file systems, the backup problem is partially solved using methods applicable to block devices, but the problem can be solved more efficiently using, for example:

  • Rsync, a universal program and protocol for synchronizing the state of file systems.
  • Built-in tools for archiving (ZFS).
  • Third party archiving tools; the most popular representative is tar. There are others, such as dar, a replacement for tar with a focus on modern systems.

Separately, it is worth mentioning the software tools for ensuring data consistency when creating backups. The most commonly used options are:

  • Mounting the file system in read-only mode (ReadOnly), or freezing the file system (freeze) - the method is applicable to a limited extent.
  • Creating snapshots of the state of the file systems or block device (LVM, ZFS).
  • The use of third-party tools for organizing casts, even in cases where the previous paragraphs cannot be provided for any reason (programs of the hotcopy type).
  • The technique of copying on change (CopyOnWrite), however, it is most often tied to the file system used (BTRFS, ZFS).

So, for a small server, you need to provide a backup scheme that meets the following requirements:

  • Easy to use - no special additional steps are required when working, minimal steps to create and restore copies.
  • Universal - works on both large and small servers; this is important when increasing the number of servers or scaling.
  • It is installed by the package manager, or in one or two commands like “download and unpack”.
  • Stable - uses a standard or long-established storage format.
  • Fast at work.

Applicants from those who more or less meet the requirements:

  • rdiff backup
  • snapshot
  • burp
  • duplicates
  • duplicity
  • let dup
  • give
  • zbackup
  • restic
  • borgbackup

Backup, part 1: Purpose, overview of methods and technologies

A virtual machine (based on XenServer) with the following characteristics will be used as a test bench:

  • 4 cores 2.5 GHz,
  • 16 GB of RAM,
  • 50 GB hybrid storage (storage with SSD caching at 20% of the virtual disk size) as a separate virtual disk without partitioning,
  • 200 Mbps channel to the Internet.

Almost the same machine will be used as the destination server for backups, only with a 500 GB hard drive.

Operating system - Centos 7 x64: the breakdown is standard, the additional partition will be used as a data source.

As initial data, let's take a site on wordpress, with media files sized 40 GB, a database on mysql. Since virtual servers vary greatly in characteristics, and for better reproducibility, there are

server test results using sysbench.sysbench --threads=4 --time=30 --cpu-max-prime=20000 cpu run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Prime number limit: 20000

Initializing worker threads…

Threads started!

cpu speed:
events per second: 836.69

Throughput:
events/s (eps): 836.6908
time elapsed: 30.0039s
total number of events: 25104

Latency (ms):
min: 2.38
avg: 4.78
max: 22.39
95th percentile: 10.46
sum: 119923.64

Thread fairness:
events(avg/stddev): 6276.0000/13.91
execution time (avg/stddev): 29.9809/0.01

sysbench --threads=4 --time=30 --memory-block-size=1K --memory-scope=global --memory-total-size=100G --memory-oper=read memory run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: read
scope: global

Initializing worker threads…

Threads started!

Total operations: 50900446 (1696677.10 per second)

49707.47 MiB transferred (1656.91 MiB/sec)

Throughput:
events/s (eps): 1696677.1017
time elapsed: 30.0001s
total number of events: 50900446

Latency (ms):
min: 0.00
avg: 0.00
max: 24.01
95th percentile: 0.00
sum: 39106.74

Thread fairness:
events(avg/stddev): 12725111.5000/137775.15
execution time (avg/stddev): 9.7767/0.10

sysbench --threads=4 --time=30 --memory-block-size=1K --memory-scope=global --memory-total-size=100G --memory-oper=write memory run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global

Initializing worker threads…

Threads started!

Total operations: 35910413 (1197008.62 per second)

35068.76 MiB transferred (1168.95 MiB/sec)

Throughput:
events/s (eps): 1197008.6179
time elapsed: 30.0001s
total number of events: 35910413

Latency (ms):
min: 0.00
avg: 0.00
max: 16.90
95th percentile: 0.00
sum: 43604.83

Thread fairness:
events(avg/stddev): 8977603.2500/233905.84
execution time (avg/stddev): 10.9012/0.41

sysbench --threads=4 --file-test-mode=rndrw --time=60 --file-block-size=4K --file-total-size=1G fileio run
sysbench 1.1.0-18a9f86 (using bundled LuaJIT 2.1.0-beta3)
Running the test with the following options:
Number of threads: 4
Initializing random number generator from current time

Extra file open flags: (none)
128 files, 8MiB each
1GiB total file size
Block size 4KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Initializing worker threads…

Threads started!

Throughput:
read: IOPS=3868.21 15.11 MiB/s (15.84 MB/s)
write: IOPS=2578.83 10.07 MiB/s (10.56 MB/s)
fsync:IOPS=8226.98

Latency (ms):
min: 0.00
avg: 0.27
max: 18.01
95th percentile: 1.08
sum: 238469.45

With this note, a big

series of articles about backup

  1. Backup, part 1: Why backup is needed, an overview of methods, technologies
  2. Backup Part 2: Reviewing and testing rsync-based backup tools
  3. Backup Part 3: Reviewing and testing duplicity, duplicaty, deja dup
  4. Backup Part 4: Reviewing and testing zbackup, restic, borgbackup
  5. Backup Part 5: Testing bacula and veeam backup for linux
  6. Backup Part 6: Comparing Backup Tools
  7. Backup Part 7: Conclusions

Source: habr.com

Add a comment