Txo cov thaub qab los ntawm 99.5% nrog hashget

hashget - nws yog dawb, qhib qhov chaw deduplicator yog ib qho khoom siv zoo ib yam li cov archiver uas tso cai rau koj kom txo qhov loj ntawm cov ntaub ntawv thaub qab, nrog rau kev npaj cov txheej txheem ntxiv thiab sib txawv thiab ntau ntxiv.

Nov yog tsab xov xwm piav qhia txog cov yam ntxwv. Kev siv tiag tiag ntawm hashget (yooj yim yooj yim) tau piav qhia hauv NYEEM qhov project thiab wiki cov ntaub ntawv.

Sib piv

Raws li txoj cai ntawm hom ntawv, kuv mam li pib tam sim ntawd nrog intrigue - piv cov txiaj ntsig:

Cov ntaub ntawv qauv
unpacked loj
.tar.gz
hashget.tar.gz

WordPress-5.1.1
43 Mb
11 Mb (26%)
155 Kb ( 0.3% )

Linux ntsiav 5.0.4
934 Mb
161 Mb (20%)
Loj 4.7 Mb ( 0.5% )

Debian 9 (LAMP) LXC VM
724 Mb
165 Mb (23%)
Loj 4.1 Mb ( 0.5% )

Keeb kwm ntawm qhov zoo tshaj plaws thiab zoo backup yuav tsum yog

Txhua zaus kuv ua ib qho thaub qab ntawm lub tshuab virtual tshiab, kuv tau haunted los ntawm kev xav tias kuv tau ua ib yam dab tsi tsis ncaj ncees lawm. Vim li cas kuv thiaj li tau txais cov thaub qab hnyav heev los ntawm lub kaw lus, qhov twg kuv qhov khoom plig tsis muaj nuj nqis, kev muaj tswv yim tsis txaus ntseeg yog ib kab index.html nrog cov ntawv nyeem "nyob zoo ntiaj teb"?

Vim li cas thiaj muaj 16 MB / usr / sbin / mysqld hauv kuv thaub qab? Puas yog tiag tiag hauv ntiaj teb no kuv muaj kev hwm ntawm kev khaws cov ntaub ntawv tseem ceeb no, thiab yog tias kuv ua tsis tiav, nws yuav ploj mus rau tib neeg? Feem ntau yuav tsis muaj. Nws yog khaws cia rau ntawm qhov kev ntseeg siab debian servers (qhov kev ntseeg siab thiab lub sijhawm ua haujlwm uas tsis tuaj yeem muab piv rau qhov kuv tuaj yeem muab tau), nrog rau hauv cov thaub qab (ntau lab ntawm lawv) ntawm lwm tus admins. Peb puas xav tau tsim 10+ 000st daim ntawv theej ntawm cov ntaub ntawv tseem ceeb no txhawm rau txhim kho kev ntseeg tau?

Feem ntau hashget thiab daws qhov teeb meem no. Thaum packed, nws tsim ib tug me me thaub qab. Thaum unpacking - ib tug kiag li unpacked system, zoo ib yam li nws yuav yog li cas tar -c / tar -x. (Hauv lwm lo lus, qhov no yog kev ntim khoom tsis zoo)

Yuav ua li cas hashget ua haujlwm

hashget muaj cov ntsiab lus ntawm Pob thiab HashPackage, nrog rau lawv cov kev pab nws ua tiav deduplication.

pob (lub hnab yas). Ib cov ntaub ntawv (feem ntau yog .deb lossis .tar.gz archive) uas tuaj yeem rub tawm kom ruaj ntseg hauv Is Taws Nem, thiab los ntawm ib lossis ntau cov ntaub ntawv tuaj yeem tau txais.

HashPackage - cov ntaub ntawv me me JSON sawv cev rau Pob, suav nrog pob URL thiab hash sums (sha256) ntawm cov ntaub ntawv los ntawm nws. Piv txwv li, rau 5 megabyte mariadb-server-core pob, lub hashpackage loj tsuas yog 6 kilobytes. Txog ib txhiab zaus tsawg dua.

Deduplication - tsim ib qho archive yam tsis muaj cov ntaub ntawv duplicate (yog tias tus deduplicator paub qhov twg cov pob thawj tuaj yeem rub tawm, nws txo cov duplicates los ntawm archive).

Ntim

Thaum packing, tag nrho cov ntaub ntawv los ntawm cov npe tau ntim tau raug tshuaj xyuas, lawv cov lej hash raug xam, thiab yog tias cov lej tau pom nyob rau hauv ib qho ntawm HashPackages paub, ces metadata txog cov ntaub ntawv (npe, hash, nkag mus, thiab lwm yam) raug cawm. nyob rau hauv cov ntaub ntawv tshwj xeeb .hashget-restore.json, uas tseem yuav suav nrog hauv archive.

Hauv qhov yooj yim tshaj plaws, lub ntim nws tus kheej zoo li tsis muaj teeb meem ntau dua li tar:

hashget -zf /tmp/mybackup.tar.gz --pack /path/to/data

Kev ntim khoom

Unpacking yog ua nyob rau hauv ob theem. Ua ntej tshaj li ib txwm tar unpacking:

tar -xf mybackup.tar.gz -C /path/to/data

Tom qab ntawd rov qab los ntawm lub network:

hashget -u /path/to/data

Thaum rov kho dua, hashget nyeem cov ntaub ntawv .hashget-restore.json, rub tawm cov pob tsim nyog, tshem tawm lawv, thiab rho tawm cov ntaub ntawv tsim nyog, txhim kho lawv hauv txoj hauv kev xav tau, nrog rau tus tswv / pab pawg / kev tso cai.

Tej yam nyuaj dua

Dab tsi tau piav qhia saum toj no yog txaus rau cov neeg uas "xav tau nws zoo li tar, tab sis kom ntim kuv Debian rau hauv 4 megabytes." Cia peb saib ntau yam nyuaj tom qab.

Kev ntsuas ua ntej

Yog tias hashget tsis muaj ib qho HashPackage txhua, ces nws tsuas yog tsis muaj peev xwm deduplicate dab tsi.

Koj tseem tuaj yeem tsim HashPackage manually (tsuas yog: hashget --submit https://wordpress.org/wordpress-5.1.1.zip -p my), tab sis muaj txoj kev yooj yim dua.

Yuav kom tau txais qhov tsim nyog hashpackage, muaj ib theem indexing (nws tau txiav txim siab nrog cov lus txib --pack) thiab heuristics. Thaum indexing, hashget "feed" txhua cov ntaub ntawv pom rau tag nrho cov muaj heuristics uas txaus siab rau nws. Heuristics tuaj yeem txheeb xyuas cov pob khoom los tsim HashPackage.

Piv txwv li, Debian heuristic hlub cov ntaub ntawv /var/lib/dpkg/status thiab kuaj pom cov pob khoom debian, thiab yog tias lawv tsis tau txheeb xyuas (tsis muaj HashPackage tsim rau lawv), rub tawm thiab ntsuas lawv. Qhov tshwm sim yog ib qho txiaj ntsig zoo heev - hashget yuav ua rau muaj txiaj ntsig zoo ntawm Debian OSes, txawm tias lawv muaj cov pob tshiab kawg.

Cov ntaub ntawv qhia

Yog tias koj lub network siv qee qhov ntawm koj cov pob khoom ntiag tug lossis cov pob pej xeem uas tsis suav nrog hauv hashget heuristics, koj tuaj yeem ntxiv cov ntaub ntawv hashget-hint.json yooj yim rau nws zoo li no:

{
    "project": "wordpress.org",
    "url": "https://ru.wordpress.org/wordpress-5.1.1-ru_RU.zip"
}

Tom ntej no, txhua lub sij hawm tsim archive, pob yuav raug indexed (yog hais tias nws tsis tau ua dhau los), thiab cov ntaub ntawv pob yuav deduplicated los ntawm archive. Tsis xav tau programming, txhua yam tuaj yeem ua los ntawm vim thiab txuag hauv txhua qhov thaub qab. Thov nco ntsoov tias ua tsaug rau qhov hash sum mus kom ze, yog tias qee cov ntaub ntawv los ntawm pob tau hloov hauv zos (piv txwv li, cov ntaub ntawv teeb tsa raug hloov), ces cov ntaub ntawv hloov pauv yuav raug khaws cia hauv cov ntaub ntawv "raws li yog" thiab yuav tsis raug txiav.

Yog tias qee qhov ntawm koj tus kheej cov pob tau hloov kho tas li, tab sis cov kev hloov pauv tsis loj heev, koj tuaj yeem hint tsuas yog rau cov ntawv loj. Piv txwv li, nyob rau hauv version 1.0 lawv ua ib tug hint taw tes rau mypackage-1.0.tar.gz, thiab nws yuav tsum tau deduplicated kiag li, ces lawv tso tawm version 1.1, uas yog me ntsis txawv, tab sis lub hint tsis hloov tshiab. OK. Tsuas yog cov ntaub ntawv uas phim (tuaj yeem rov qab mus rau) version 1.0 yog deduplicated.

Lub heuristic uas ua cov ntaub ntawv hint yog ib qho piv txwv zoo rau kev nkag siab txog cov txheej txheem hauv kev ua haujlwm ntawm heuristics. Nws tsuas yog ua cov ntaub ntawv hashget-hint.json (lossis .hashget-hint.json nrog lub cim) thiab tsis quav ntsej tag nrho lwm tus. Los ntawm cov ntaub ntawv no, nws txiav txim siab lub pob URL twg yuav tsum tau indexed, thiab hashget indexes nws (yog tias nws tsis tau ua li ntawd)

HashServer

Nws yuav ua haujlwm hnyav heev los ua qhov ntsuas tag nrho thaum tsim cov thaub qab. Txhawm rau ua qhov no, koj yuav tsum rub tawm txhua pob, qhib nws, thiab ntsuas nws. Yog li hashget siv lub tswv yim nrog HashServer. Thaum kuaj pom pob Debian, yog tias nws tsis pom hauv HashPackage hauv zos, ib qho kev sim ua ntej kom yooj yim rub tawm HashPackage los ntawm hash server. Thiab tsuas yog tias qhov no tsis ua haujlwm, hashget nws tus kheej rub tawm thiab hashs pob (thiab uploads nws mus rau hashserver, kom hashserver muab nws yav tom ntej).

HashServer yog ib qho kev xaiv ntawm lub tswv yim, tsis yog qhov tseem ceeb, nws tsuas yog ua haujlwm kom nrawm thiab txo cov khoom thauj ntawm cov chaw cia khoom. Yooj yim xiam oob qhab (optional --hashserver tsis muaj parameter). Ntxiv rau, koj tuaj yeem yooj yim ua koj tus kheej hashserver.

Incremental thiab txawv backups, npaj obsolescence

hashget ua rau nws yooj yim heev los ua daim duab incremental thiab differential backups. Vim li cas peb tsis index peb cov thaub qab nws tus kheej (nrog tag nrho peb cov ntaub ntawv tshwj xeeb)? Ib pab neeg --submit thiab koj ua tiav! Cov thaub qab tom ntej uas hashget tsim yuav tsis suav nrog cov ntaub ntawv los ntawm cov ntaub ntawv no.

Tab sis qhov no tsis yog txoj hauv kev zoo heev, vim tias nws yuav tig tawm tias thaum rov kho peb yuav tsum rub tag nrho cov hashget thaub qab hauv keeb kwm tag nrho (yog tias txhua tus muaj tsawg kawg yog ib cov ntaub ntawv tshwj xeeb). Muaj ib tug mechanism rau qhov no npaj obsolescence ntawm backups. Thaum indexing, koj tuaj yeem qhia hnub tas sij hawm ntawm HashPackage --expires 2019-06-01, thiab tom qab hnub no (los ntawm 00:00), nws yuav tsis siv. Lub archive nws tus kheej tsis tuaj yeem raug tshem tawm tom qab hnub no (Txawm hais tias hashget tuaj yeem yooj yim qhia cov URLs ntawm txhua qhov thaub qab uas yog / yuav lwj thaum lub sijhawm lossis hnub twg).

Piv txwv li, yog tias peb ua tag nrho cov thaub qab ntawm 1st thiab ntsuas nws nrog lub neej mus txog rau thaum xaus ntawm lub hli, peb yuav tau txais cov txheej txheem thaub qab sib txawv.

Yog tias peb txheeb xyuas cov thaub qab tshiab hauv tib txoj kev, yuav muaj cov txheej txheem ntawm cov ntaub ntawv rov qab ntxiv.

Tsis zoo li cov txheej txheem ib txwm muaj, hashget tso cai rau koj siv ntau qhov chaw hauv qab. Cov thaub qab yuav raug txo ob qho tib si los ntawm kev txo cov ntaub ntawv los ntawm cov thaub qab yav dhau los (yog tias muaj) thiab los ntawm cov ntaub ntawv pej xeem (dab tsi tuaj yeem rub tawm).

Yog tias qee qhov laj thawj peb tsis ntseeg qhov kev ntseeg siab ntawm Debian cov peev txheej (https://snapshot.debian.org/) lossis siv lwm qhov kev faib tawm, peb tuaj yeem ua kom tag nrho thaub qab ib zaug nrog tag nrho cov pob, thiab tom qab ntawd tso siab rau nws (los ntawm kev tsis ua haujlwm heuristics). Tam sim no, yog tias tag nrho cov servers ntawm peb cov kev faib tawm tau dhau los ua tsis muaj rau peb (ntawm cov khoom plig hauv Is Taws Nem lossis thaum lub sijhawm zombie apocalypse), tab sis peb cov ntaub ntawv thaub qab tau nyob rau hauv kev txiav txim, peb tuaj yeem rov qab los ntawm qhov sib txawv luv luv uas tsuas yog nyob ntawm peb cov thaub qab ua ntej. .

Hashget tsuas yog tso siab rau cov peev txheej rov qab los ntawm koj qhov kev txiav txim siab. Cov uas koj xav txog kev ntseeg siab yuav raug siv.

FilePool thiab Glacier

Mechanism FilePool tso cai rau koj tsis tas yuav tiv tauj cov servers sab nraud kom rub cov pob khoom, tab sis siv cov pob khoom los ntawm cov npe hauv zos lossis cov neeg ua haujlwm server, piv txwv li:

$ hashget -u . --pool /tmp/pool

los yog

$ hashget -u . --pool http://myhashdb.example.com/

Txhawm rau ua ib lub pas dej hauv cov npe hauv zos, koj tsuas yog xav tau tsim cov npe thiab muab cov ntaub ntawv pov tseg rau hauv nws, hashget nws tus kheej yuav pom qhov nws xav tau siv cov hashs. Txhawm rau ua kom lub pas dej nkag tau los ntawm HTTP, koj yuav tsum tsim cov symlinks hauv txoj kev tshwj xeeb; qhov no ua tiav nrog ib qho lus txib (hashget-admin --build /var/www/html/hashdb/ --pool /tmp/pool). HTTP FilePool nws tus kheej yog cov ntaub ntawv zoo li qub, yog li txhua lub vev xaib yooj yim tuaj yeem ua haujlwm rau nws, qhov thauj khoom ntawm lub server yuav luag xoom.

Ua tsaug rau FilePool, koj tuaj yeem siv tsis yog http(s) cov peev txheej ua cov peev txheej, tab sis kuj Piv txwv li, Amazon Glacier.

Tom qab uploading cov thaub qab mus rau lub glacier, peb tau txais nws Upload ID thiab siv nws li URL. Piv txwv li:

hashget --submit Glacier_Upload_ID --file /tmp/my-glacier-backup.tar.gz --project glacier --hashserver --expires 2019-09-01

Tam sim no tshiab (differential) thaub qab yuav ua raws li qhov thaub qab no thiab yuav luv dua. Tom qab tar unpacking diffbackup, peb tuaj yeem pom dab tsi cov peev txheej nws tso siab rau:

hashget --info /tmp/unpacked/ list

thiab tsuas yog siv lub plhaub ntawv txhawm rau rub tawm tag nrho cov ntaub ntawv no los ntawm Glacier mus rau lub pas dej thiab ua haujlwm rov qab li qub: hashget -u / tmp / unpacked β€”pool / tmp / pas

Qhov kev ua si puas tsim nyog rau tswm ciab?

Hauv qhov yooj yim tshaj plaws, koj tsuas yog them tsawg dua rau kev thaub qab (yog tias koj khaws cia rau qhov chaw hauv huab rau nyiaj). Tej zaum ntau, ntau tsawg.

Tab sis qhov ntawd tsis yog tib qho xwb. Qhov ntau hloov mus rau qhov zoo. Koj tuaj yeem siv qhov no kom tau txais kev txhim kho zoo rau koj cov txheej txheem thaub qab. Piv txwv li, txij li peb cov thaub qab tam sim no luv dua, peb tuaj yeem ua tsis yog thaub qab txhua hli, tab sis txhua hnub. Khaws lawv tsis yog rau lub hlis, zoo li ua ntej, tab sis rau 5 xyoos. Yav dhau los, koj khaws cia hauv qeeb tab sis pheej yig "txias" cia (Glacier), tam sim no koj tuaj yeem khaws cia hauv qhov kub kub, los ntawm qhov chaw koj tuaj yeem rub tawm cov thaub qab sai sai thiab rov qab tau hauv feeb, tsis yog hauv ib hnub.

Koj tuaj yeem ua kom muaj kev ntseeg siab ntawm kev khaws cia. Yog tias tam sim no peb khaws cia rau hauv ib qho chaw khaws cia, tom qab ntawd los ntawm kev txo qhov ntim ntawm cov ntaub ntawv thaub qab, peb yuav tuaj yeem khaws cia rau hauv 2-3 qhov chaw cia thiab muaj sia nyob tsis muaj mob yog tias ib qho ntawm lawv puas.

Yuav sim thiab pib siv li cas?

Mus rau nplooj ntawv gitlab https://gitlab.com/yaroslaff/hashget, nruab nrog ib tug hais kom ua (pip3 install hashget[plugins]) thiab tsuas yog nyeem thiab ua haujlwm nrawm pib. Kuv xav tias nws yuav siv sijhawm 10-15 feeb los ua txhua yam yooj yim. Tom qab ntawd koj tuaj yeem sim compress koj lub tshuab virtual, ua cov ntaub ntawv hint yog tias tsim nyog los ua kom lub zog compression muaj zog, ua si nrog cov pas dej, lub chaw khaws ntaub ntawv hauv zos thiab hash server yog tias koj txaus siab, thiab hnub tom qab saib seb qhov loj npaum li cas ntawm qhov incremental backup. yuav nyob saum nag hmo.

Tau qhov twg los: www.hab.com

Ntxiv ib saib