Rage madadin da kashi 99.5% tare da hashget

hashget - kyauta ne, buɗaɗɗen tushe deduplicator wani mai amfani ne mai kama da ma'ajiyar bayanai wanda ke ba ka damar rage girman ma'ajin ajiya sosai, da kuma tsara tsare-tsare na kari da banbance-banbance da sauransu.

Wannan labarin bayyani ne don bayyana fasalin. Ainihin amfani da hashget (mai sauƙi) an bayyana shi a ciki README aikin da takardun wiki.

Daidaita

Bisa ga dokar nau'in, zan fara nan da nan tare da makirci - kwatanta sakamakon:

Samfurin bayanai
girman da ba a shirya ba
.tar.gz
hashget.tar.gz

WordPress-5.1.1
43 Mb
11 Mb (26%)
155 Kb 0.3% )

Linux Nernel Linux
934 Mb
161 Mb (20%)
4.7 Mb 0.5% )

Debian 9 (LAMP) LXC VM
724 Mb
165 Mb (23%)
4.1 Mb 0.5% )

Fage akan abin da ya kamata ya zama madaidaicin manufa da tasiri

Duk lokacin da na yi ajiyar sabuwar na'ura mai kama-da-wane, na ji damuwa da jin cewa ina yin wani abu ba daidai ba. Me yasa na sami babban ma'auni daga tsarin, inda ƙirƙira ta mai ƙima, mara lalacewa ta kasance mai layi ɗaya.html tare da rubutun "Hello duniya"?

Me yasa akwai 16 MB /usr/sbin/mysqld a madadina? Shin da gaske ne cewa a cikin duniyar nan ina da darajar kiyaye wannan muhimmin fayil, kuma idan na kasa, zai ɓace ga ɗan adam? Mai yiwuwa a'a. Ana adana shi akan sabar debian abin dogaro sosai (amintacce da lokacin da ba za a iya kwatanta su da abin da zan iya bayarwa ba), da kuma a madadin (miliyoyin su) na sauran admins. Shin muna buƙatar ƙirƙirar kwafin 10+ na 000st na wannan muhimmin fayil don inganta dogaro?

Gabaɗaya hashget kuma yana magance wannan matsalar. Lokacin da aka tattara, yana ƙirƙirar maajiyar ƙarami. Lokacin kwancewa - tsarin da ba a cika shi gaba ɗaya ba, kama da abin da zai kasance idan tar -c / tar -x. (Ma'ana, wannan marufi ne mara asara)

Yadda hashget ke aiki

hashget yana da ra'ayoyi na Kunshin da HashPackage, tare da taimakonsu yana aiwatar da ƙaddamarwa.

Package (jakar roba). Fayil (yawanci tarihin .deb ko .tar.gz) wanda za'a iya sauke shi cikin aminci daga Intanet, kuma daga inda za'a iya samun fayiloli ɗaya ko fiye.

HashPackage - ƙaramin fayil ɗin JSON mai wakiltar Kunshin, gami da URL ɗin fakiti da jimlar zanta (sha256) na fayiloli daga gare ta. Misali, don kunshin mariadb-server-core megabyte 5, girman fakitin ya kai kilobytes 6 kawai. Kusan sau dubu ƙasa.

Kwafi - ƙirƙirar rumbun adana bayanai ba tare da kwafin fayiloli ba (idan mai cirewa ya san inda za'a iya sauke fakitin asali, yana rage kwafi daga ma'ajiyar bayanai).

Marufi

Lokacin tattarawa, ana bincika duk fayiloli daga kundin adireshi da aka tattara, ana ƙididdige adadin hash ɗin su, kuma idan an sami jimlar a ɗaya daga cikin sanannun HashPackages, to, metadata game da fayil ɗin (suna, hash, haƙƙin samun dama, da sauransu) an adana su. a cikin fayil na musamman .hashget-restore.json, wanda kuma za a haɗa shi a cikin ma'ajiyar kayan tarihi.

A cikin mafi sauƙi, marufi da kanta bai fi rikitarwa fiye da tar:

hashget -zf /tmp/mybackup.tar.gz --pack /path/to/data

Kwance kayan aiki

Ana cire kaya a matakai biyu. Da farko ana kwashe kwalta da aka saba:

tar -xf mybackup.tar.gz -C /path/to/data

sai a mayar da shi daga cibiyar sadarwa:

hashget -u /path/to/data

Lokacin da ake maidowa, hashget yana karanta fayil ɗin .hashget-restore.json, zazzage fakitin da ake buƙata, buɗe su, da fitar da fayilolin da ake buƙata, shigar da su cikin hanyoyin da ake buƙata, tare da mai shi / ƙungiya / izini da ake buƙata.

Abubuwa masu wahala

Abin da aka kwatanta a sama ya riga ya isa ga waɗanda "suna son shi kamar kwalta, amma su tattara Debian na zuwa megabytes 4." Bari mu kalli abubuwa masu rikitarwa daga baya.

Indexing

Idan hashget bashi da HashPackage guda ɗaya kwata-kwata, to kawai ba zai iya fitar da komai ba.

Hakanan zaka iya ƙirƙirar HashPackage da hannu (kawai: hashget --submit https://wordpress.org/wordpress-5.1.1.zip -p my), amma akwai hanya mafi dacewa.

Domin samun buƙatun da ake buƙata, akwai mataki indexing (ana kashe shi ta atomatik tare da umarnin --pack) da kuma ilimin lissafi. Lokacin zayyanawa, hashget “Ciyarwa” kowane fayil da aka samo ga duk abubuwan da ke akwai masu sha'awar sa. Heuristics na iya yin lissafin kowane Kunshin don ƙirƙirar HashPackage.

Misali, Debian heuristic yana son fayil /var/lib/dpkg/status kuma yana gano fakitin debian da aka shigar, kuma idan ba a lissafta su ba (babu wani HashPackage da aka ƙirƙira don su), zazzagewa da ƙididdige su. Sakamakon sakamako ne mai kyau sosai - hashget koyaushe koyaushe zai cire Debian OSes yadda yakamata, koda kuwa suna da sabbin fakiti.

Bayanan fayiloli

Idan cibiyar sadarwar ku tana amfani da wasu fakitin mallakar ku ko fakitin jama'a waɗanda ba a haɗa su a cikin hashget heuristics ba, zaku iya ƙara fayil ɗin ambaton hashget-hint.json mai sauƙi kamar haka:

{
    "project": "wordpress.org",
    "url": "https://ru.wordpress.org/wordpress-5.1.1-ru_RU.zip"
}

Na gaba, duk lokacin da aka ƙirƙiri ma'ajin ajiya, za a yi lissafin fakitin (idan ba a daɗe ba), kuma za a cire fayilolin fakitin daga ma'ajiyar. Babu shirye-shirye da ake bukata, duk abin da za a iya yi daga vim da ajiye a kowane madadin. Da fatan za a lura cewa godiya ga tsarin jimlar hash, idan an canza wasu fayiloli daga fakitin a gida (alal misali, an canza fayil ɗin sanyi), to za a adana fayilolin da aka canza a cikin tarihin “kamar yadda yake” kuma ba za a yanke su ba.

Idan ana sabunta wasu fakitin ku lokaci-lokaci, amma sauye-sauyen ba su da girma sosai, kuna iya yin nuni ga manyan juzu'i kawai. Misali, a cikin sigar 1.0 sun yi nuni da ke nuna mypackage-1.0.tar.gz, kuma za a cire shi gaba daya, sannan suka fito da sigar 1.1, wanda ya dan bambanta, amma ba a sabunta alamar ba. Ya yi. Fayilolin da suka yi daidai (za a iya mayar da su zuwa) sigar 1.0 kawai ana cire su.

Heuristic da ke aiwatar da fayil ɗin ambato misali ne mai kyau don fahimtar tsarin ciki na yadda aikin heuristics ke aiki. Yana sarrafa fayilolin hashget-hint.json (ko .hashget-hint.json tare da digo) kuma yana watsi da duk wasu. Daga wannan fayil ɗin, yana ƙayyade wane fakitin URL ya kamata a sanya maƙasudi, kuma hashget ya yi nuni da shi (idan bai riga ya yi haka ba)

HashServer

Zai zama mai wahala sosai don yin cikakken fihirisa lokacin ƙirƙirar madadin. Don yin wannan, kuna buƙatar zazzage kowane fakitin, cire kayansa, sannan ku lissafta shi. Don haka hashget yana amfani da tsari tare da HashServer. Lokacin da aka gano kunshin Debian da aka shigar, idan ba a samo shi a cikin HashPackage na gida ba, ana fara ƙoƙarin zazzage HashPackage daga uwar garken hash. Kuma idan wannan bai yi aiki ba, hashget da kansa yana zazzagewa ya hashes kunshin (kuma ya loda shi zuwa hashserver, ta yadda mai hashserver ya samar da shi nan gaba).

HashServer wani zaɓi ne na zaɓi na makirci, ba mahimmanci ba, yana aiki kawai don haɓakawa da rage nauyi akan ma'ajin. A sauƙaƙe a kashe (na zaɓi --hashserver ba tare da sigogi ba). Bugu da ƙari, za ku iya sauƙi yi naku hashserver.

Ƙarfafawa da bambance-bambancen ajiya, ɓata lokaci da aka tsara

hashget yana sauƙaƙa sosai don yin zane kari da bambancin madadin. Me ya sa ba za mu yi lissafin madadin mu da kanta (tare da duk fayilolin mu na musamman)? Tawaga daya --submit kuma kun gama! Ajiyayyen na gaba wanda hashget ke ƙirƙira ba zai haɗa da fayiloli daga wannan tarihin ba.

Amma wannan ba hanya ce mai kyau ba, saboda yana iya zama cewa lokacin da muke maidowa dole ne mu cire duk bayanan hashget a cikin tarihin gaba ɗaya (idan kowannensu ya ƙunshi aƙalla babban fayil ɗaya). Akwai hanyar yin hakan da aka shirya ɓata lokaci na madadin. Lokacin yin firikwensin, zaku iya tantance ranar ƙarewar HashPackage --expires 2019-06-01, kuma bayan wannan kwanan wata (daga 00:00), ba za a yi amfani da shi ba. Ba za a iya share tarihin kanta ba bayan wannan kwanan wata (Ko da yake hashget na iya nuna dacewa da URLs na duk abubuwan da aka adana waɗanda suke / za su lalace a yanzu ko a kowace rana).

Alal misali, idan muka yi cikakken madadin a kan 1st da kuma fihirisa shi tare da rayuwa har zuwa karshen wata, za mu sami bambanci madadin tsarin.

Idan muka bayyano sabbin madogara ta hanya ɗaya, za a sami tsarin majin kari.

Ba kamar tsare-tsare na gargajiya ba, hashget yana ba ku damar amfani da tushen tushe da yawa. Za a rage wariyar ajiya duka ta hanyar rage fayiloli daga madadin baya (idan akwai) da fayilolin jama'a (abin da za'a iya saukewa).

Idan saboda wasu dalilai ba mu amince da amincin albarkatun Debian ba (https://snapshot.debian.org/) ko amfani da wani rarraba, za mu iya kawai yin cikakken madadin sau ɗaya tare da duk fakitin, sa'an nan kuma dogara da shi (ta hanyar kashe masu aikin heuristics). Yanzu, idan duk sabobin na rarrabawarmu sun zama ba su samuwa a gare mu (akan Intanet na kyauta ko a lokacin apocalypse na aljan), amma madadin mu yana cikin tsari, za mu iya murmurewa daga kowane ɗan gajeren wariyar ajiya wanda ya dogara ne kawai akan abubuwan da muka adana a baya. .

Hashget kawai ya dogara ne akan amintattun hanyoyin dawo da su bisa ga shawarar ku. Za a yi amfani da waɗanda kuke ganin abin dogara.

FilePool da Glacier

Kayan aiki FilePool yana ba ku damar tuntuɓar sabar na waje koyaushe don zazzage fakiti, amma amfani da fakiti daga kundin adireshin gida ko sabar kamfani, misali:

$ hashget -u . --pool /tmp/pool

ko

$ hashget -u . --pool http://myhashdb.example.com/

Don yin tafkin a cikin kundin adireshi, kawai kuna buƙatar ƙirƙirar kundin adireshi kuma ku jefa fayiloli a ciki, hashget kanta zai sami abin da yake buƙata ta amfani da hashes. Don samun damar tafkin ta hanyar HTTP, kuna buƙatar ƙirƙirar alamomi ta hanya ta musamman; ana yin wannan tare da umarni ɗaya (hashget-admin --build /var/www/html/hashdb/ --pool /tmp/pool). HTTP FilePool kanta fayiloli ne tsaye, don haka kowane sabar gidan yanar gizo mai sauƙi zai iya yi masa hidima, nauyin sabar ya kusan sifili.

Godiya ga FilePool, zaku iya amfani da albarkatu ba kawai http(s) azaman tushen tushe ba, har ma Misali, Amazon Glacier.

Bayan loda wariyar ajiya zuwa glacier, muna samun ID na Upload kuma muyi amfani da shi azaman URL. Misali:

hashget --submit Glacier_Upload_ID --file /tmp/my-glacier-backup.tar.gz --project glacier --hashserver --expires 2019-09-01

Yanzu sababbin (nau'i-nau'i) madadin za a dogara ne akan wannan madadin kuma zai zama guntu. Bayan kwato kwal din diffbackup, zamu iya ganin irin albarkatun da ya dogara da su:

hashget --info /tmp/unpacked/ list

kuma kawai amfani da rubutun harsashi don zazzage duk waɗannan fayiloli daga Glacier zuwa tafkin kuma gudanar da farfadowar da aka saba: hashget -u /tmp/unpacked —pool /tmp/pool

Shin wasan ya cancanci kyandir?

A cikin mafi sauƙi, za ku biya kuɗi kaɗan don ajiyar kuɗi (idan kun adana su a wani wuri a cikin gajimare don kuɗi). Wataƙila da yawa, ƙasa da yawa.

Amma ba wannan kadai ba ne. Yawan yana juya zuwa inganci. Kuna iya amfani da wannan don samun haɓaka mai inganci zuwa tsarin ajiyar ku. Misali, tunda abubuwan da muke ajiyewa a yanzu sun fi guntu, ba za mu iya yin ajiyar kowane wata ba, amma na yau da kullun. Ajiye su ba don watanni shida ba, kamar yadda ya gabata, amma har tsawon shekaru 5. A baya can, kun adana shi a cikin jinkirin amma mai arha "sanyi" ajiya (Glacier), yanzu zaku iya adana shi a cikin ma'ajiyar zafi, daga inda koyaushe zaku iya saukar da madadin da sauri kuma ku dawo da shi cikin mintuna, ba cikin rana ɗaya ba.

Kuna iya ƙara amincin ajiyar ajiya. Idan a halin yanzu mun adana su a cikin wurin ajiya guda ɗaya, to, ta hanyar rage yawan adadin ajiyar kuɗi, za mu iya adana su a cikin wuraren ajiya 2-3 kuma mu tsira ba tare da jin zafi ba idan ɗaya daga cikinsu ya lalace.

Yadda za a gwada da fara amfani?

Jeka shafin gitlab https://gitlab.com/yaroslaff/hashget, shigar da umarni ɗaya (pip3 install hashget[plugins]) kuma kawai karanta kuma aiwatar da saurin farawa. Ina tsammanin zai ɗauki minti 10-15 don yin duk abubuwa masu sauƙi. Sannan zaku iya gwada damfara injin ɗin ku, sanya fayilolin nuni idan ya cancanta don ƙara ƙarfin matsawa, wasa tare da wuraren waha, bayanan zanta na gida da sabar hash idan kuna sha'awar, kuma washegari ku ga girman girman madadin ƙari. zai kasance a saman na jiya.

source: www.habr.com

Add a comment