A adana ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. Magani mai ɗaukar nauyi

A adana ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. Magani mai ɗaukar nauyi

Ya ku Jama'a, Wannan labarin zai mayar da hankali kan adanawa da kuma dawo da ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. A wannan mataki, ana ba da shawarar mafita ta ƙarshe don tsarin fayil masu jituwa na POSIX tare da cikakken goyon baya ga makullai, gami da makullin gungu, kuma da alama har ma ba tare da kullun ba.

Don haka na rubuta sabar al'ada tawa don wannan dalili.
A yayin aiwatar da wannan aikin, mun sami nasarar magance babbar matsalar, kuma a lokaci guda mun sami tanadi a sararin faifai da RAM, wanda tsarin fayil ɗin mu na cluster yana cinyewa ba tare da jin ƙai ba. A haƙiƙa, irin waɗannan adadin fayiloli suna da illa ga kowane tsarin fayil ɗin da ya taru.

Manufar ita ce:

A cikin kalmomi masu sauƙi, ana loda ƙananan fayiloli ta hanyar uwar garke, ana ajiye su kai tsaye a cikin ma'ajin, kuma ana karanta su, kuma ana ajiye manyan fayiloli gefe da gefe. Tsari: babban fayil 1 = 1 archive, gabaɗaya muna da rumbun adana bayanai miliyan da yawa tare da ƙananan fayiloli, kuma ba fayiloli miliyan ɗari ba. Kuma duk wannan ana aiwatar da shi gabaɗaya, ba tare da wani rubutu ba ko sanya fayiloli a cikin ma'ajin tar/zip.

Zan yi ƙoƙari in taƙaita shi, ina ba da hakuri a gaba idan rubutun ya yi tsawo.

Duk ya fara ne da cewa ba zan iya samun sabar da ta dace a duniya ba wacce za ta iya adana bayanan da aka karɓa ta hanyar ka'idar HTTP kai tsaye zuwa cikin ma'ajin ajiya, ba tare da lahani da ke tattare da rumbun adana kayan tarihi na al'ada da adana abubuwa ba. Kuma dalilin binciken shine tushen tushen sabobin 10 da suka girma zuwa babban sikelin, wanda 250,000,000 ƙananan fayiloli sun riga sun tara, kuma yanayin haɓaka ba zai daina ba.

Ga waɗanda ba sa son karanta labarai, ɗan takaddun ya fi sauƙi:

a nan и a nan.

Kuma docker a lokaci guda, yanzu akwai zaɓi kawai tare da nginx a ciki kawai idan:

docker run -d --restart=always -e host=localhost -e root=/var/storage 
-v /var/storage:/var/storage --name wzd -p 80:80 eltaline/wzd

Gaba:

Idan akwai fayiloli da yawa, ana buƙatar mahimman albarkatu, kuma mafi munin sashi shine wasu daga cikinsu sun ɓace. Misali, lokacin amfani da tsarin fayil ɗin tari (a wannan yanayin, MooseFS), fayil ɗin, ko da kuwa girmansa na ainihi, koyaushe yana ɗaukar aƙalla 64 KB. Wato, don girman 3, 10 ko 30 KB, ana buƙatar 64 KB akan faifai. Idan akwai kwata na fayiloli biliyan, za mu rasa daga 2 zuwa 10 terabytes. Ba zai yiwu a ƙirƙiri sababbin fayiloli ba har abada, tunda MooseFS yana da iyaka: ba fiye da biliyan 1 tare da kwafi ɗaya na kowane fayil ba.

Yayin da adadin fayiloli ke ƙaruwa, ana buƙatar RAM da yawa don metadata. Yawancin jujjuyawar metadata akai-akai kuma suna ba da gudummawa ga lalacewa da tsagewar abubuwan tafiyar SSD.

wZD uwar garken. Mun tsara abubuwa a kan faifai.

An rubuta uwar garken a cikin Go. Da farko, ina buƙatar rage adadin fayiloli. Yadda za a yi? Saboda adanawa, amma a wannan yanayin ba tare da matsawa ba, tunda fayilolina hotuna ne kawai da aka matsa. BoltDB ya zo don ceto, wanda har yanzu dole ne a kawar da shi daga gazawarsa, wannan yana nunawa a cikin takardun.

Gabaɗaya, a maimakon kwata na fayiloli biliyan, a cikin nawa akwai 10 miliyan na Bolt archives. Idan na sami damar canza tsarin fayil ɗin adireshi na yanzu, zai yiwu a rage shi zuwa kusan fayiloli miliyan 1.

Dukkan kananan fayiloli suna cushe a cikin ma'ajiyar tarihin Bolt, wanda kai tsaye suna karɓar sunayen kundayen adireshi da suke cikin su, kuma duk manyan fayiloli suna nan kusa da ma'ajiyar bayanai; babu wata ma'ana a tattara su, wannan abu ne mai iya canzawa. Ana adana ƙananan ƙananan, manyan ba a canza su ba. Sabar tana aiki a bayyane tare da duka biyun.

Gine-gine da fasalulluka na uwar garken wZD.

A adana ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. Magani mai ɗaukar nauyi

Sabar tana aiki a ƙarƙashin Linux, BSD, Solaris da OSX. Na gwada gine-ginen AMD64 kawai a ƙarƙashin Linux, amma yakamata yayi aiki don ARM64, PPC64, MIPS64.

Babban fasali:

  • Multithreading;
  • Multiserver, yana ba da haƙuri ga kuskure da daidaita nauyi;
  • Matsakaicin bayyana gaskiya ga mai amfani ko mai haɓakawa;
  • Hanyoyin HTTP masu goyan baya: SAMU, HEAD, PUT da DELETE;
  • Sarrafa halin karatu da rubutu ta hanyar masu kai abokin ciniki;
  • Taimako ga runduna mai sassauƙa;
  • Goyi bayan amincin bayanan CRC lokacin rubutu / karantawa;
  • Semi-dynamic buffers don ƙarancin amfani da ƙwaƙwalwar ajiya da ingantaccen aikin cibiyar sadarwa;
  • Ƙunƙarar bayanan da aka jinkirta;
  • Bugu da ƙari, ana ba da wZA archiver mai zare da yawa don ƙaura fayiloli ba tare da dakatar da sabis ɗin ba.

Kwarewa ta Gaskiya:

Na daɗe ina haɓakawa da gwada sabar da ma'ajiyar bayanai akan bayanan rayuwa na dogon lokaci, yanzu yana samun nasarar aiki akan gungu wanda ya haɗa da ƙananan fayiloli 250,000,000 (hotuna) waɗanda ke cikin kundayen adireshi 15,000,000 akan keɓantattun hanyoyin SATA. Tarin sabobin 10 shine uwar garken Asalin da aka shigar a bayan hanyar sadarwar CDN. Don yi masa hidima, ana amfani da sabar Nginx 2 + 2 wZD.

Ga waɗanda suka yanke shawarar amfani da wannan uwar garken, zai zama hikima a tsara tsarin tsarin, idan an zartar, kafin amfani. Bari in yi ajiyar wuri nan da nan cewa ba a nufin uwar garken don cushe komai a cikin ma'ajiyar Bolt 1.

Gwajin aiki:

Karamin girman fayil ɗin zipped, ana yin saurin GET da ayyukan PUT akansa. Bari mu kwatanta jimlar lokacin rubutun abokin ciniki na HTTP zuwa fayiloli na yau da kullun da ma'ajiyar Bolt, da kuma karantawa. Aiki tare da fayiloli masu girma dabam 32 KB, 256 KB, 1024 KB, 4096 KB da 32768 KB an kwatanta.

Lokacin aiki tare da Archives na Bolt, ana bincika amincin bayanan kowane fayil (ana amfani da CRC), kafin yin rikodi da kuma bayan rikodin, karatun kan tashi da sake ƙididdigewa yana faruwa, wannan a zahiri yana gabatar da jinkiri, amma babban abu shine tsaro na bayanai.

Na gudanar da gwaje-gwajen aiki akan faifan SSD, tun da gwaje-gwaje akan faifan SATA ba su nuna bambanci ba.

Hotuna bisa sakamakon gwaji:

A adana ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. Magani mai ɗaukar nauyi
A adana ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. Magani mai ɗaukar nauyi

Kamar yadda kake gani, ga ƙananan fayiloli bambancin lokacin karantawa da rubutawa tsakanin fayilolin da aka adana da waɗanda ba a ajiye su ƙarami ne.

Muna samun hoto daban-daban yayin gwajin karatu da rubuta fayilolin 32 MB cikin girman:

A adana ɗaruruwan miliyoyin ƙananan fayiloli yadda ya kamata. Magani mai ɗaukar nauyi

Bambancin lokaci tsakanin karatun fayiloli yana tsakanin 5-25 ms. Tare da rikodi, abubuwa sun fi muni, bambancin shine game da 150 ms. Amma a wannan yanayin babu buƙatar loda manyan fayiloli; babu ma'ana a yin haka kawai; suna iya rayuwa dabam daga ma'ajin.

* A fasaha, zaku iya amfani da wannan uwar garken don ayyukan da ke buƙatar NoSQL.

Hanyoyin asali na aiki tare da uwar garken wZD:

Ana loda fayil na yau da kullun:

curl -X PUT --data-binary @test.jpg http://localhost/test/test.jpg

Loda fayil zuwa rumbun adana bayanai na Bolt (idan sigar uwar garken fmaxsize, wacce ke tantance girman girman fayil ɗin da za a iya haɗawa cikin ma'ajiyar, bai wuce ba; idan ya wuce, za a loda fayil ɗin kamar yadda aka saba kusa da ma'ajiyar bayanai):

curl -X PUT -H "Archive: 1" --data-binary @test.jpg http://localhost/test/test.jpg

Zazzage fayil (idan akwai fayiloli tare da sunaye iri ɗaya akan faifai kuma a cikin tarihin, sannan lokacin zazzagewa, ana ba da fifiko ta tsohuwa zuwa fayil ɗin da ba a adanawa):

curl -o test.jpg http://localhost/test/test.jpg

Zazzage fayil daga rumbun adana bayanai na Bolt (tilastawa):

curl -o test.jpg -H "FromArchive: 1" http://localhost/test/test.jpg

Bayanin wasu hanyoyin suna cikin takaddun.

wZD Takardun
wZA Takardun

Sabar a halin yanzu tana goyan bayan ka'idar HTTP kawai; ba ta aiki da HTTPS tukuna. Hakanan ba a tallafawa hanyar POST (har yanzu ba a yanke shawarar ko ana buƙata ko a'a ba).

Duk wanda ya tono lambar tushe zai sami butterscotch a can, ba kowa ne ke son shi ba, amma ban daura babban lambar zuwa ayyukan tsarin gidan yanar gizon ba, sai dai mai katsewa, don haka nan gaba zan iya hanzarta sake rubutawa kusan kowane. inji.

Abin Yi:

  • Haɓaka mai kwafin ku da mai rarrabawa + geo don yuwuwar amfani a cikin manyan tsarin ba tare da tsarin fayil ɗin tari ba (Komai na manya)
  • Yiwuwar cikakkiyar dawo da metadata idan ta ɓace gaba ɗaya (idan ana amfani da mai rarrabawa)
  • Yarjejeniya ta asali don ikon yin amfani da ci gaba da haɗin yanar gizo da direbobi don harsunan shirye-shirye daban-daban
  • Babban damar yin amfani da bangaren NoSQL
  • Matsanancin nau'ikan nau'ikan nau'ikan nau'ikan (gzip, zstd, snappy) don fayiloli ko ƙima a cikin rumbun adana bayanan Bolt da fayilolin yau da kullun
  • Rufe nau'ikan nau'ikan fayiloli ko dabi'u a cikin ma'ajin Bolt da na fayiloli na yau da kullun
  • Juyin bidiyo na gefen uwar garke, gami da GPU

Ina da komai, ina fatan wannan uwar garken zai zama da amfani ga wani, lasisin BSD-3, haƙƙin mallaka biyu, tunda da babu kamfani da nake aiki, da ba a rubuta uwar garken ba. Ni kadai ne mai haɓakawa. Zan yi godiya ga kowane kwari da buƙatun fasalin da kuka samu.

source: www.habr.com

Add a comment