Backup Part 7: Conclusions

Backup Part 7: Conclusions

This note completes the backup cycle. It will discuss the logical organization of a dedicated server (or VPS), convenient for backup, and will also offer the option of quickly restoring a server from a backup without much downtime in the event of an accident.

Initial data

A dedicated server most often has at least two hard drives that serve to organize a first-level RAID array (mirror). This is necessary to be able to continue the server if one disk fails. If this is a regular dedicated server, there may be a separate hardware RAID controller, with active caching technology on the SSD, so that in addition to regular hard drives, one or more SSDs can be connected. Sometimes dedicated servers are offered, in which only SATADOM is present from local disks (small disks, structurally - a flash drive connected to a SATA port), or even a regular small (8-16GB) flash drive connected to a special internal port, and the data is taken from the storage system , connected via a dedicated storage network (Ethernet 10G, FC, etc.), and there are dedicated servers that are loaded directly from the storage system. I will not consider such options, since in such cases the task of backing up the server smoothly passes to the specialist who maintains the storage system, usually there are different proprietary snapshot technologies, built-in deduplication and other joys of the system administrator discussed in the previous parts of this cycle. The volume of a disk array of a dedicated server can reach several tens of terabytes, depending on the number and volume of disks connected to the server. In the case of VPS, the volumes are more modest: usually no more than 100GB (but there are more), and the tariffs for such VPS can easily be more expensive than the cheapest dedicated servers from the same host. VPS most often has one disk, because under it there will be a storage system (or something hyperconverged). Sometimes a VPS has several disks with different characteristics, for different purposes:

  • small system - for installing the operating system;
  • large - storage of user data.

When reinstalling the system using the control panel, the disk with user data is not overwritten, but the system one is completely reloaded. Also, in the case of VPS, the hoster may offer a button that takes a snapshot of the VPS (or disk), however, if you install your operating system or forget to activate the required service inside the VPS, some of the data may still be lost. In addition to the button, a data storage service is usually offered, most often very limited. This is usually an account with access via FTP or SFTP, sometimes with SSH, with a truncated shell (for example, rbash), or a restriction on running commands via authorized_keys (via ForcedCommand).

A dedicated server is connected to the network with two ports at a speed of 1Gbps, sometimes it can be cards with a speed of 10Gbps. A VPS usually has only one network interface. Most often, data centers do not limit the network speed inside the data center, but limit the speed of Internet access.

A typical load of such a dedicated server or VPS is a web server, database, application server. Sometimes various additional auxiliary services can be installed, including for a web server or database: a search engine, a mail system, etc.

A specially prepared server acts as a space for storing backups, which will be described in more detail later.

The logical organization of the disk system

If there is a RAID controller, or it is a VPS with one disk, and there are no special preferences for the operation of the disk subsystem (for example, a separate fast disk for the database), all free space is divided as follows: one partition is created, an LVM volume group is created on top of it , several volumes are created in it: 2 small volumes of the same size used as the root file system (they change one by one during updates for the possibility of a quick rollback, the idea was peeped from the Calculate Linux distribution), another one is for the swap partition, the rest of the free space is divided into small volumes , used as the root file system for full-fledged containers, disks for virtual machines, file systems for accounts in / home (each account has its own file system), file systems for application containers.

Important note: volumes must be completely self-contained, i.e. should not depend on each other or on the root file system. In the case of virtual machines or containers, this moment is observed automatically. If these are application containers or home directories, you should think about separating the configuration files of the web server and other services in such a way as to remove the dependencies of the volumes among themselves as much as possible. For example, each site works as its own user, the site configuration files are in the user's home directory, in the web server settings, site configuration files are not included through /etc/nginx/conf.d/.conf, and, for example, /home//configs/nginx/*.conf

If there are several disks, you can create a software RAID array (and configure its caching on SSD, if there is a need and opportunity), on top of which you can assemble LVM according to the rules proposed above. Also in this case, you can use ZFS or BtrFS, but here it is worth thinking a few times: both require a much more serious approach to resources, and besides, ZFS is not bundled with the Linux kernel.

Regardless of the scheme used, it is always worthwhile to estimate in advance the approximate speed of writing changes to disks, and then calculate the amount of free space that will be reserved for creating snapshots. For example, if our server writes data at a speed of 10 megabytes per second, and the size of the entire array with data is 10 terabytes - the synchronization time can reach a day (22 hours - this amount will be transferred over a 1 GB network) - it is worth reserving about 800 GB . In reality, the figure will be less, you can safely divide it by the number of logical volumes.

Backup storage server device

The main difference between a server for storing backups is large, cheap and relatively slow disks. Since modern HDDs have already crossed the bar of 10TB in one disk, it is imperative to use file systems or RAID with checksums, because during the rebuilding of the array or restoring the file system (several days!) The second disk may fail due to increased load. On disks with a capacity of up to 1TB, this was not so sensitive. For ease of description, I assume that the disk space is divided into two parts of approximately the same size (again, for example, using LVM):

  • volumes corresponding on the servers used to store user data (the last backup made will be deployed to them for verification);
  • volumes used as BorgBackup repositories (data for backups will go directly here).

The principle of operation is that separate volumes are created for each server under the BorgBackup repositories, where data from the combat servers will go. The repositories work in add-only mode, which eliminates the possibility of intentional deletion of data, and due to deduplication and periodic cleaning of repositories from old backups (annual copies remain, monthly for the last year, weekly for the last month, daily for the last week, possibly in special cases - hourly for the last day: total 24 + 7 + 4 + 12 + annual - approximately 50 copies for each server).
The BorgBackup repositories do not enable add-only mode, instead use ForcedCommand in .ssh/authorized_keys something like this:

from="адрСс сСрвСра",command="/usr/local/bin/borg serve --append-only --restrict-to-path /home/servername/borgbackup/",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc AAAAA.......

The specified path contains a wrapper script on top of borg, which, in addition to launching the binary with parameters, additionally starts the process of restoring the backup after the end of data capture. To do this, the wrapper script creates a flag file next to the corresponding repository. The last backup made after the completion of the pouring process is automatically restored to the corresponding logical volume.

This design allows you to periodically clean unnecessary backups, and also does not allow production servers to delete anything on the backup storage server.

Backup process

The backup is initiated by the dedicated server or VPS itself, since this scheme gives more control over the backup process from this server. The first step is to take a snapshot of the active root file system, which is mounted and uploaded using BorgBackup to the backup storage server. After the capture is completed, the snapshot is unmounted and deleted.

In the case of a small database (up to 1 GB for each site), a database dump is made, which is stored in the appropriate logical volume, where the rest of the site's data is located, but in such a way that the dump is not available through the web server. If the databases are large, you should set up a β€œhot” data removal, for example, using xtrabackup for MySQL, or WAL with archive_command in PostgreSQL. In this case, the database will be restored separately from these sites.

If containers or virtual machines are used, qemu-guest-agent, CRIU, or other necessary technologies should be configured. In other cases, additional settings are most often not needed - we simply create snapshots of logical volumes, which are then processed similarly to a snapshot of the state of the root file system. Once the data is taken, the pictures are deleted.

Further work goes on the backup storage server:

  • the last backup made in each repository is checked,
  • checks for the presence of a label file indicating that the data capture process is completed,
  • data is deployed to the corresponding local volume,
  • tag file is deleted

Server recovery process

If the main server dies, then a similar dedicated server is launched, which is loaded from some standard image. It will most likely boot over the network, but the data center technician who sets up the server can immediately copy this standard image to one of the disks. Loading occurs in RAM, after which the recovery process starts:

  • a request is made to attach a block device using iscsinbd or another similar protocol to the logical volume containing the root file system of the dead server; since the root filesystem needs to be small, this step should be completed in a few minutes. The bootloader is also being restored;
  • the structure of local logical volumes is recreated, logical volumes are attached from the backup server using the dm_clone kernel module: data recovery begins, and changes are written immediately to local disks
  • a container is launched with all available physical disks - the server is fully restored, but with reduced performance;
  • after data synchronization is completed, the logical volumes are disconnected from the backup server, the container is turned off, the server is rebooted;

After the reboot, the server will have all the data that was at the time of the backup, as well as include all the changes that were made during the restore process.

Other articles in the series

Backup, part 1: Why backup is needed, an overview of methods, technologies
Backup Part 2: Reviewing and testing rsync-based backup tools
Backup Part 3: Review and testing of duplicity, duplicati
Backup Part 4: Reviewing and testing zbackup, restic, borgbackup
Backup Part 5: Testing Bacula and Veeam Backup for Linux
Backup: part at the request of readers: an overview of AMANDA, UrBackup, BackupPC
Backup Part 6: Comparing Backup Tools
Backup Part 7: Conclusions

I invite you to discuss the proposed option in the comments, thank you for your attention!

Source: habr.com

Add a comment