How to compact up to 90% backup storage in object storage

Our Turkish clients asked us to properly set up a backup for the data center. We do similar projects in Russia, but this is where the story was more about researching how to do it better.

Given: there is a local S3 storage, there is Veritas NetBackup, which has acquired a new advanced functionality for moving data to object storages, now with support for deduplication, and there is a problem with free space in this local storage.

Task: to make everything so that the process of storing backups is fast and cheap.

Actually, before that, everything in S3 was just files, and these were complete casts of critical data center machines. That is, not so that it is very optimized, but everything worked at the start. Now it's time to figure it out and do it right.

The picture shows what we came up with:

How to compact up to 90% backup storage in object storage

As you can see, the first backup was made slowly (70 Mb / s), and subsequent backups of the same systems were much faster.

Actually, further a little more details about what features are there.

Backup logs for those who are ready to read half a page of a dumpFull with rescan
Dec 18, 2018 12:09:43 PM β€” Info bpbkar (pid=4452) accelerator sent 14883996160 bytes out of 14883994624 bytes to server, optimization 0.0%
Dec 18, 2018 12:10:07 PM - Info NBCC (pid=23002) StorageServer=PureDisk_rhceph_rawd:s3.cloud.ngn.com.tr; Report=PDDO Stats (multi-threaded stream used) for (NBCC): scanned: 14570817 KB, CR sent: 1760761 KB, CR sent over FC: 0 KB, dedup: 87.9%, cache disabled

Full
Dec 18, 2018 12:13:18 PM β€” Info bpbkar (pid=2864) accelerator sent 181675008 bytes out of 14884060160 bytes to server, optimization 98.8%
Dec 18, 2018 12:13:40 PM - Info NBCC (pid=23527) StorageServer=PureDisk_rhceph_rawd:s3.cloud.ngn.com.tr; Report=PDDO Stats for (NBCC): scanned: 14569706 KB, CR sent: 45145 KB, CR sent over FC: 0 KB, dedup: 99.7%, cache disabled

Incremental
Dec 18, 2018 12:15:32 PM β€” Info bpbkar (pid=792) accelerator sent 9970688 bytes out of 14726108160 bytes to server, optimization 99.9%
Dec 18, 2018 12:15:53 PM - Info NBCC (pid=23656) StorageServer=PureDisk_rhceph_rawd:s3.cloud.ngn.com.tr; Report=PDDO Stats for (NBCC): scanned: 14383788 KB, CR sent: 15700 KB, CR sent over FC: 0 KB, dedup: 99.9%, cache disabled

Full
Dec 18, 2018 12:18:02 PM β€” Info bpbkar (pid=3496) accelerator sent 171746816 bytes out of 14884093952 bytes to server, optimization 98.8%
Dec 18, 2018 12:18:24 PM - Info NBCC (pid=23878) StorageServer=PureDisk_rhceph_rawd:s3.cloud.ngn.com.tr; Report=PDDO Stats for (NBCC): scanned: 14569739 KB, CR sent: 34120 KB, CR sent over FC: 0 KB, dedup: 99.8%, cache disabled

What is the problem

Customers want to backup as often as possible and store as cheaply as possible. It is best to store them cheaply in S3 type object stores, because they are the cheapest in terms of maintenance cost per megabyte from where you can roll back the backup in a reasonable time. When there is a lot of backup, it becomes not very cheap, because most of the storage is occupied by copies of the same data. In the case of HaaS Turkish colleagues, storage can be compacted by about 80-90%. It is clear that this applies specifically to their specifics, but I would definitely count on at least 50% of dedup.

To solve the problem, major vendors have long made gateways on Amazon S3. All of their methods are compatible with local S3 as long as they support the Amazon API. In the Turkish data center, a backup is made to our S3, as in the T-III "Compressor" in Russia, since such a scheme of work has shown itself well with us.

And our S3 is fully compatible with Amazon S3 backup methods. That is, all backup tools that support these methods allow you to copy everything to such storage out of the box.

Veritas NetBackup made a CloudCatalyst feature:

How to compact up to 90% backup storage in object storage

That is, between the machines that need to be backed up and the gateway, there is an intermediate Linux server through which the backup traffic from the SRC agents passes and they are deduplicated β€œon the fly” before transferring them to S3. If earlier there were 30 backups of 20 GB with compression, now (due to the similarity of the machines) they have become 90% smaller in volume. The deduplication engine is the same as when stored on regular disks using Netbackup.

Here's what happens before the intermediate server:

How to compact up to 90% backup storage in object storage

We tested and concluded that when implemented in our data centers, this saves space in S3 storage for us and for customers. As the owner of commercial data centers, of course, we charge according to the volume occupied, but it’s still very profitable for us too - because we start earning in more scalable places in software, and not on renting hardware. Well, and this is a reduction in internal costs.

Logs228 Jobs (0 Queued 0 Active 0 Waiting for Retry 0 Suspended 0 Incomplete 228 Done - 13 selected)
(Filter Applied [13])

Job Id Type State State Details Status Job Policy Job Schedule Client Media Server Start Time Elapsed Time End Time Storage Unit Attempt Operation Kilobytes Files Pathname % Complete (Estimated) Job PID Owner Copy Parent Job ID KB/Sec Active Start Active Elapsed Robot Vault Profile Session ID Media to Eject Data Movement Off-Host Type Master Priority Deduplication Rate Transport Accelerator Optimization Instance or Database Share Host
- 1358 Snapshot Done 0 VMware - NGNCloudADC NBCC Dec 18, 2018 12:16:19 PM 00:02:18 Dec 18, 2018 12:18:37 PM STU_DP_S3_****backup 1 100% root 1358 Dec 18, 2018 12 :16:27 PM 00:02:10 Instant Recovery Disk Standard WIN-*************** 0
1360 Backup Done 0 VMware Full NGNCloudADC NBCC Dec 18, 2018 12:16:48 PM 00:01:39 Dec 18, 2018 12:18:27 PM STU_DP_S3_****backup 1 14,535,248 149654 100% 23858 root 1358 335,098 Dec 18 , 2018 12:16:48 PM 00:01:39 Instant Recovery Disk Standard WIN-*************** 0 99.8% 99%
1352 Snapshot Done 0 VMware - NGNCloudADC NBCC Dec 18, 2018 12:14:04 PM 00:02:01 Dec 18, 2018 12:16:05 PM STU_DP_S3_****backup 1 100% root 1352 Dec 18, 2018 12: 14:14 PM 00:01:51 Instant Recovery Disk Standard WIN-*************** 0
1354 Backup Done 0 VMware Incremental NGNCloudADC NBCC Dec 18, 2018 12:14:34 PM 00:01:21 Dec 18, 2018 12:15:55 PM STU_DP_S3_****backup 1 14,380,965 147 100% 23617 root 1352 500,817 Dec 18 , 2018 12:14:34 PM 00:01:21 Instant Recovery Disk Standard WIN-*********** 0 99.9% 100%
1347 Snapshot Done 0 VMware - NGNCloudADC NBCC Dec 18, 2018 12:11:45 PM 00:02:08 Dec 18, 2018 12:13:53 PM STU_DP_S3_****backup 1 100% root 1347 Dec 18, 2018 12: 11:45 PM 00:02:08 Instant Recovery Disk Standard WIN-*************** 0
1349 Backup Done 0 VMware Full NGNCloudADC NBCC Dec 18, 2018 12:12:02 PM 00:01:41 Dec 18, 2018 12:13:43 PM STU_DP_S3_****backup 1 14,535,215 149653 100% 23508 root 1347 316,319 Dec 18 , 2018 12:12:02 PM 00:01:41 Instant Recovery Disk Standard WIN-*************** 0 99.7% 99%
1341 Snapshot Done 0 VMware - NGNCloudADC NBCC Dec 18, 2018 12:05:28 PM 00:04:53 Dec 18, 2018 12:10:21 PM STU_DP_S3_****backup 1 100% root 1341 Dec 18, 2018 12: 05:28 PM 00:04:53 Instant Recovery Disk Standard WIN-*************** 0
1342 Backup Done 0 VMware Full_Rescan NGNCloudADC NBCC Dec 18, 2018 12:05:47 PM 00:04:24 Dec 18, 2018 12:10:11 PM STU_DP_S3_****backup 1 14,535,151 149653 100% 22999 root 1341 70,380 Dec 18 , 2018 12:05:47 PM 00:04:24 Instant Recovery Disk Standard WIN-*********** 0 87.9% 0%

1339 Snapshot Done 150 VMware - NGNCloudADC NBCC Dec 18, 2018 11:05:46 AM 00:00:53 Dec 18, 2018 11:06:39 AM STU_DP_S3_****backup 1 100% root 1339 Dec 18, 2018 11: 05:46 AM 00:00:53 Instant Recovery Disk Standard WIN-*************** 0
1327 Snapshot Done 0 VMware - ******.********.cloud NBCC Dec 17, 2018 12:54:42 PM 05:51:38 Dec 17, 2018 6:46:20 PM STU_DP_S3_****backup 1 100% root 1327 Dec 17, 2018 12:54:42 PM 05:51:38 Instant Recovery Disk Standard WIN-*********** 0
1328 Backup Done 0 VMware Full ******.********.cloud NBCC Dec 17, 2018 12:55:10 PM 05:29:21 Dec 17, 2018 6:24:31 PM STU_DP_S3_****backup 1 222,602,719 258932 100% 12856 root 1327 11,326 Dec 17, 2018 12:55:10 PM 05:29:21 Instant Recovery Disk Standard WIN-*********** 0 87.9% 0%
1136 Snapshot Done 0 VMware - ******.********.cloud NBCC Dec 14, 2018 4:48:22 PM 04:05:16 Dec 14, 2018 8:53:38 PM STU_DP_S3_****backup 1 100% root 1136 Dec 14, 2018 4:48:22 PM 04:05:16 Instant Recovery Disk Standard WIN-*********** 0
1140 Backup Done 0 VMware Full_Scan *******.**********.cloud NBCC Dec 14, 2018 4:49:14 PM 03:49:58 Dec 14, 2018 8:39:12 PM STU_DP_S3_****backup 1 217,631,332 255465 100% 26438 root 1136 15,963 Dec 14, 2018 4:49:14 PM 03:49:58 Instant Recovery Disk Standard WIN-*********** 0 45.2% 0%

The accelerator allows you to reduce traffic from agents, because only data changes are transferred, that is, even full backups are not poured entirely, since the media server collects subsequent full backups from incremental backups.

The intermediate server has its own storage, where it writes a "cache" of data and keeps the base for deduplication.

The full architecture looks like this:

  1. The master server manages configuration, updates, and more and resides in the cloud.
  2. The media server (intermediate *nix machine) should be closest to the redundant systems in terms of network availability. Here, backups are deduplicated from all redundant machines.
  3. On redundant machines, there are agents that generally send to the media server only what is not in its storage.

It all starts with a full scan - this is a full-fledged full backup. At this point, the media server picks up everything, deduplicates it, and sends it to S3. The speed to the media server is low, from it - higher. The main limitation is the processing power of the server.

The following backups are made from the point of view of all systems full, but in reality they are something like synthetic full backups. That is, the actual transfer and recording to the media server is only for those data blocks that have not yet been seen in VM backups before. And the transfer and recording to S3 goes only for those data blocks whose hash is not in the media server's deduplication database. In simpler words, what has not been seen in any backup of any VM before.

When restoring, the media server requests the necessary deduplicated objects from S3, rehydrates them, and passes them on to the SRK agents, i.e. it is necessary to take into account the amount of traffic during the restore, which will be equal to the actual amount of data being restored.

Here's how it looks:

How to compact up to 90% backup storage in object storage

And here is another piece of logs169 Jobs (0 Queued 0 Active 0 Waiting for Retry 0 Suspended 0 Incomplete 169 Done - 1 selected)

Job Id Type State State Details Status Job Policy Job Schedule Client Media Server Start Time Elapsed Time End Time Storage Unit Attempt Operation Kilobytes Files Pathname % Complete (Estimated) Job PID Owner Copy Parent Job ID KB/Sec Active Start Active Elapsed Robot Vault Profile Session ID Media to Eject Data Movement Off-Host Type Master Priority Deduplication Rate Transport Accelerator Optimization Instance or Database Share Host
β€” 1372 Restore Done 0 nbpr01 NBCC Dec 19, 2018 1:05:58 PM 00:04:32 Dec 19, 2018 1:10:30 PM 1 14,380,577 1 100% 8548 root 1372 70,567 Dec 19, 2018 1:06 00:00 PM 04:30:90000 WIN-*************** XNUMX

Data integrity is ensured by the protection of S3 itself - there is good redundancy there to protect against hardware failures such as a dead hard drive spindle.

The media server needs 4 TB of cache - this is Veritas' recommendation for the minimum amount. More is better, but that's how we did it.

Π‘onclusion

When a partner threw 3 GB into our S20, we stored 60 GB, because we provide three times the geo-reserve of data. Now there is much less traffic, which is good both for the channel and for storage billing.

In this case, the routes are closed past the "big Internet", but you can also drive traffic through VPN L2 over the Internet, but it is better to install the media server before the provider's entrance.

If you are interested in learning about these features in our Russian data centers or have questions about their implementation, ask in the comments or in the mail [email protected].

Source: habr.com

Add a comment