Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Nkhaniyi ili kale yachiwiri pamutu wothamanga kwambiri wa data. Nkhani yoyamba idafotokoza za kompresa yomwe imagwira ntchito pa liwiro la 10 GB / sec. pa purosesa pachimake (kupanikizika kochepa, RTT-Min).

Compressor iyi yakhazikitsidwa kale pazida zamakina azamalamulo kuti azipondereza kwambiri zotayira zosungiramo zosungirako komanso kukulitsa mphamvu ya cryptography; itha kugwiritsidwanso ntchito kufinya zithunzi zamakina owoneka bwino ndi mafayilo osinthira a RAM mukawasunga pa liwiro lalikulu. SSD zoyendetsa.

Nkhani yoyamba idalengezanso za kukhazikitsidwa kwa algorithm yophatikizira zosunga zosunga zobwezeretsera za HDD ndi SSD disk drive (kupanikiza kwapakatikati, RTT-Mid) yokhala ndi magawo owongolera kwambiri a data. Pakadali pano, kompresa iyi yakonzeka kwathunthu ndipo nkhaniyi ikunena za izi.

Compressor yomwe imagwiritsa ntchito algorithm ya RTT-Mid imapereka chiŵerengero cha compression chofanana ndi zosungira zakale monga WinRar, 7-Zip, zomwe zimagwira ntchito mothamanga kwambiri. Pa nthawi yomweyi, liwiro lake logwira ntchito ndilocheperako kwambiri.

Kuthamanga kwa kulongedza / kumasula deta ndi gawo lofunika kwambiri lomwe limatsimikizira kukula kwa kagwiritsidwe ntchito ka matekinoloje a compression. Ndizokayikitsa kuti aliyense angaganize zokanikizira terabyte ya data pa liwiro la 10-15 MegaBytes pa sekondi imodzi (imeneyi ndi liwiro ndendende la ma archives mumayendedwe wamba), chifukwa zingatenge pafupifupi maola makumi awiri ndi purosesa yonse. .

Komano, terabyte yemweyo akhoza kukopera pa liwiro la dongosolo la 2-3Gigabytes pamphindikati pafupifupi mphindi khumi.

Choncho, kuponderezedwa kwa chidziwitso chachikulu ndikofunika ngati kumachitidwa pa liwiro losatsika kuposa liwiro la kulowetsa / kutulutsa kwenikweni. Kwa machitidwe amakono awa ndi osachepera 100 Megabytes pamphindikati.

Ma compressor amakono amatha kutulutsa liwiro lotere pokhapokha "mwachangu". Ndi momwe zilili pano pomwe tidzafanizira algorithm ya RTT-Mid ndi ma compressor achikhalidwe.

Kuyesa kofananiza kwa algorithm yatsopano yophatikizira

Compressor ya RTT-Mid idagwira ntchito ngati gawo la pulogalamu yoyeserera. Mu "ntchito" yeniyeni imagwira ntchito mofulumira kwambiri, imagwiritsa ntchito multithreading mwanzeru ndipo imagwiritsa ntchito compiler "yachibadwa", osati C #.

Popeza ma compressor omwe amagwiritsidwa ntchito poyesa kufananitsa amamangidwa pa mfundo zosiyanasiyana ndi mitundu yosiyanasiyana ya data compress mosiyana, chifukwa cha cholinga cha mayeso, njira yoyezera "kutentha kwapakati pachipatala" idagwiritsidwa ntchito ...

Fayilo yotaya gawo ndi gawo la disk yomveka yokhala ndi Windows 10 makina ogwiritsira ntchito adapangidwa; uku ndiye kusakaniza kwachilengedwe kwamitundu yosiyanasiyana yama data yomwe imapezeka pakompyuta iliyonse. Kupondereza fayiloyi kukulolani kuti mufananize liwiro ndi kuchuluka kwa kuponderezedwa kwa algorithm yatsopano ndi ma compressor apamwamba kwambiri omwe amagwiritsidwa ntchito muzosungira zamakono.

Nayi fayilo yotaya:

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Fayilo yotaya idapanikizidwa pogwiritsa ntchito PTT-Mid, 7-zip, ndi WinRar compressor. WinRar ndi 7-zip compressor adayikidwa pa liwiro lalikulu.

Compressor kuthamanga 7-zip:

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Imadzaza purosesa ndi 100%, pomwe liwiro lapakati powerenga kutaya koyambirira ndi pafupifupi 60 MegaBytes / sec.

Compressor kuthamanga Zowonjezera:

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Zomwe zilili ndizofanana, kuchuluka kwa purosesa kuli pafupifupi 100%, kuthamanga kwapakati pakutaya ndi pafupifupi 125 Megabytes / sec.

Monga momwe zinalili m'mbuyomu, kuthamanga kwa zosungirako kumachepetsedwa ndi mphamvu za purosesa.

Pulogalamu yoyeserera ya kompresa tsopano ikugwira ntchito RTT-Mid:

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Chithunzicho chikuwonetsa kuti purosesa imadzazidwa ndi 50% ndipo imakhala yopanda ntchito nthawi yonseyi, chifukwa palibe paliponse pomwe mungakweze deta yothinikizidwa. Deta yokweza disk (Disk 0) yatsala pang'ono kudzaza. Kuthamanga kwa data (Disk 1) kumasiyana kwambiri, koma pafupifupi kuposa 200 MegaBytes / sec.

Kuthamanga kwa kompresa kumakhala kochepa pankhaniyi ndikutha kulemba deta yoponderezedwa ku Disk 0.

Tsopano chiŵerengero cha kuponderezedwa kwa zosungidwazo:

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Zitha kuwoneka kuti kompresa ya RTT-Mid idachita bwino kwambiri kuponderezana; zosungidwa zomwe zidapanga zinali zocheperako 1,3 GigaBytes kuposa zakale za WinRar ndi 2,1 GigaBytes zazing'ono kuposa zakale za 7z.

Nthawi yogwiritsidwa ntchito popanga zolemba zakale:

  • 7-zip - 26 mphindi 10 masekondi;
  • WinRar - Mphindi 17 masekondi 40;
  • RTT-Mid - 7 mphindi 30 masekondi.

Chifukwa chake, ngakhale pulogalamu yoyeserera, yopanda kukhathamiritsa, pogwiritsa ntchito algorithm ya RTT-Mid, idakwanitsa kupanga zosungirako kuposa nthawi ziwiri ndi theka mwachangu, pomwe zosungirako zidakhala zazing'ono kwambiri kuposa za omwe akupikisana nawo ...

Iwo omwe sakhulupirira zowonera amatha kudziwonera okha. Pulogalamu yoyeserera ikupezeka pa kugwirizana, koperani ndi kufufuza.

Koma pa mapurosesa omwe ali ndi chithandizo cha AVX-2, popanda kuthandizidwa ndi malangizo awa, kompresa sikugwira ntchito, ndipo osayesa ma aligorivimu pa mapurosesa akale a AMD, amachedwa potsatira malangizo a AVX ...

Njira yogwiritsira ntchito compress

Algorithm imagwiritsa ntchito njira yolozera zidutswa zamawu mobwerezabwereza mu granularity ya byte. Njira yophatikizirayi yadziwika kwa nthawi yayitali, koma sinagwiritsidwe ntchito chifukwa kufananizako kunali kokwera mtengo kwambiri potengera zofunikira komanso kumafuna nthawi yochulukirapo kuposa kumanga dikishonale. Chifukwa chake RTT-Mid aligorivimu ndi chitsanzo chapamwamba chosunthira "kubwerera ku tsogolo" ...

Compressor ya PTT imagwiritsa ntchito makina osakira othamanga kwambiri, omwe amatilola kufulumizitsa kukakamiza. Chojambulira chodzipangira chokha, ichi ndi "chithumwa changa ...", "ndichokwera mtengo kwambiri, chifukwa ndi chopangidwa ndi manja" (cholembedwa mu assembler).

Kusaka kwa machesi kumapangidwa molingana ndi dongosolo la magawo awiri: choyamba, kukhalapo kwa "chizindikiro" cha machesi kumafufuzidwa, ndipo pokhapokha "chizindikiro" chikadziwika pamalo ano, njira yodziwira machesi enieni. wayamba.

Zenera losakira machesi lili ndi kukula kosayembekezereka, kutengera kuchuluka kwa entropy mu block block yosinthidwa. Kwa deta yachisawawa (yosasunthika) ili ndi kukula kwa megabytes, chifukwa deta ndi kubwereza nthawi zonse imakhala yaikulu kuposa megabyte.

Koma mitundu yambiri yamakono ya deta ndi yosasunthika ndipo kuyendetsa makina opangira zinthu pogwiritsa ntchito izo sikuthandiza komanso kuwononga, kotero scanner imagwiritsa ntchito njira ziwiri zogwirira ntchito. Choyamba, zigawo za mawu oyambira omwe ali ndi kubwereza kotheka amafufuzidwa; opaleshoniyi ikuchitikanso pogwiritsa ntchito njira yotheka ndipo imachitika mofulumira kwambiri (pa liwiro la 4-6 GigaBytes/sec). Madera omwe ali ndi machesi otheka amakonzedwa ndi scanner yayikulu.

Kuphatikizika kwa index sikothandiza kwambiri, muyenera kusintha magawo obwereza ndi ma index, ndipo mndandanda wazomwe umachepetsa kwambiri kupsinjika.

Kuti muwonjezere chiŵerengero cha kuponderezana, osati machesi athunthu a zingwe za byte omwe amalembedwa, komanso ochepa, pamene chingwecho chili ndi ma byte ofananira ndi osagwirizana. Kuti muchite izi, mtundu wa index umaphatikizapo gawo la chigoba cha machesi lomwe limawonetsa ma byte ofananira a midadada iwiri. Pakupanikizana kokulirapo, kulondolera kumagwiritsidwa ntchito kukweza midadada ingapo yofananira pa block yomwe ilipo.

Zonsezi zidapangitsa kuti pakhale zotheka kupeza mu PTT-Mid kompresa chiŵerengero chofananira ndi ma compressor opangidwa pogwiritsa ntchito njira ya mtanthauzira mawu, koma akugwira ntchito mwachangu kwambiri.

Kuthamanga kwa algorithm yatsopano ya compression

Ngati kompresa ikugwira ntchito ndi cache memory yokha (4 Megabytes imafunika pa ulusi), ndiye kuti kuthamanga kwa ntchito kumachokera ku 700-2000 Megabytes/sec. pa purosesa pachimake, kutengera mtundu wa deta wothinikizidwa ndipo zimatengera pang'ono ntchito pafupipafupi purosesa.

Ndi kukhazikitsidwa kwamitundu yambiri kwa compressor, scalability yabwino imatsimikiziridwa ndi kukula kwa cache yachitatu. Mwachitsanzo, kukhala ndi 9 MegaBytes of cache memory "pa bolodi", palibe chifukwa choyambitsa ulusi wopitilira awiri; liwiro silingachuluke kuchokera pa izi. Koma ndi cache ya 20 Megabytes, mutha kuyendetsa kale ulusi wopondereza asanu.

Komanso, latency ya RAM imakhala gawo lofunikira lomwe limatsimikizira kuthamanga kwa compressor. Ma algorithm amagwiritsa ntchito mwayi wopita ku OP, ena omwe salowa mu cache memory (pafupifupi 10%) ndipo amayenera kukhala opanda pake, kuyembekezera deta kuchokera ku OP, yomwe imachepetsa kuthamanga kwa ntchito.

Zimakhudza kwambiri kuthamanga kwa compressor ndi ntchito ya data input / linanena bungwe dongosolo. Zopempha kwa OP kuchokera ku I/O block zopempha za data kuchokera ku CPU, zomwe zimachepetsanso kuthamanga kwa kukanikiza. Vutoli ndilofunika kwambiri pama laputopu ndi ma desktops; kwa ma seva ndizosafunikira kwambiri chifukwa chaukadaulo wowongolera mabasi ndi RAM yamakanema angapo.

Pamalemba onse omwe ali m'nkhaniyo timakamba za kupsinjika; decompression imakhalabe kunja kwa nkhaniyi popeza "chilichonse chaphimbidwa ndi chokoleti". Decompression imathamanga kwambiri ndipo imachepetsedwa ndi liwiro la I/O. Pakatikati pamutu umodzi mu ulusi umodzi imapereka mosavuta kutulutsa kwa 3-4 GB / sec.

Izi ndichifukwa chakusowa kwa ntchito yosaka machesi panthawi ya decompression, yomwe "imadya" zinthu zazikulu za purosesa ndi kukumbukira kwa cache panthawi yoponderezedwa.

Kudalirika kwa kusungidwa kwa data kothinikizidwa

Monga dzina la kalasi yonse ya mapulogalamu omwe amagwiritsa ntchito kupanikizika kwa deta (archivers) akusonyezera, amapangidwa kuti azisungidwa kwa nthawi yaitali, osati zaka, koma kwa zaka mazana ambiri ...

Panthawi yosungira, zosungirako zosungirako zimataya deta, nachi chitsanzo:

Kuponderezana Kwambiri Kwambiri Kulephera-Safe (Kupitilira)

Wonyamula zidziwitso wa "analogi" uyu ali ndi zaka chikwi, zidutswa zina zatayika, koma zambiri "ziwerengeka" ...

Palibe opanga omwe ali ndi udindo wopanga makina amakono osungira deta ndi media media kwa iwo omwe amapereka zitsimikizo zachitetezo chokwanira cha data kwazaka zopitilira 75.
Ndipo ili ndi vuto, koma vuto lomwe layimitsidwa, mbadwa zathu zidzathetsa ...

Machitidwe osungira deta a digito amatha kutaya deta osati pambuyo pa zaka 75, zolakwika mu deta zimatha kuwonekera nthawi iliyonse, ngakhale panthawi yojambula, amayesa kuchepetsa kusokoneza kumeneku pogwiritsa ntchito redundancy ndikuwongolera ndi machitidwe owongolera zolakwika. Ma Redundancy ndi kukonza machitidwe sangathe kubwezeretsanso zidziwitso zotayika nthawi zonse, ndipo ngati atero, palibe chitsimikizo kuti ntchito yobwezeretsayo idamalizidwa molondola.

Ndipo ilinso ndi vuto lalikulu, koma osati lochedwetsedwa, koma lamakono.

Ma compressor amakono omwe amagwiritsidwa ntchito posunga zidziwitso za digito amamangidwa pakusintha kosiyanasiyana kwa njira ya mtanthauzira mawu, ndipo pazosungidwa zotere kutayika kwa chidziwitso kudzakhala chochitika chakupha; palinso mawu okhazikika amtunduwu - malo osungira "osweka". ...

Kutsika kodalirika kwa kusunga zidziwitso m'malo osungira zakale ndi kukanikizidwa kwa mtanthauzira mawu kumalumikizidwa ndi kapangidwe ka data yopanikizidwa. Zomwe zili munkhokwe yoteroyo mulibe mawu oyambira, manambala azolemba mudikishonale amasungidwa pamenepo, ndipo mtanthauzira mawuwo amasinthidwa ndi mawu opanikizidwa panopo. Ngati kachidutswa kosungirako zakale katayika kapena kawonongeka, zolembedwa zonse zotsatizana sizingadziwike mwina ndi zomwe zili mudikishonale kapena kutalika kwa zomwe zalembedwa mudikishonale, chifukwa sizikudziwika kuti nambala yolembera mtanthauzira mawu ikugwirizana ndi chiyani.

Sizingatheke kubwezeretsa zambiri kuchokera muzosungirako "zosweka".

Ma algorithm a RTT amachokera ku njira yodalirika yosungira deta yoponderezedwa. Imagwiritsa ntchito njira yowerengera ndalama pobwereza zidutswa. Njira iyi yoponderezedwa imakupatsani mwayi wochepetsera zotsatira za kupotoza kwa chidziwitso pa sing'anga yosungiramo, ndipo nthawi zambiri mumangosintha zolakwika zomwe zidachitika panthawi yosungiramo zidziwitso.
Izi ndichifukwa choti fayilo yosungidwa mukamakanikiza index ili ndi magawo awiri:

  • gawo lolemba lomwe lili ndi magawo obwereza omwe achotsedwapo;
  • index munda.

Gawo la index, lomwe ndi lofunika kwambiri pakubwezeretsanso chidziwitso, silokulirapo ndipo litha kubwerezedwanso kuti lisungidwe zodalirika. Chifukwa chake, ngakhale chidutswa cha magwero a magwero kapena mndandanda wazolozera chitayika, zidziwitso zina zonse zidzabwezeretsedwa popanda mavuto, monga momwe zilili pachithunzichi ndi "analogi" yosungirako sing'anga.

Zoyipa za algorithm

Palibe zabwino popanda zovuta. Njira yopondereza ya index simakakamiza kubwereza kwakanthawi kochepa. Izi ndichifukwa cha malire a njira ya index. Ma index ndi osachepera 3 byte kukula kwake ndipo amatha kufika 12 byte kukula kwake. Ngati kubwereza kumakumana ndi kukula kwazing'ono kusiyana ndi ndondomeko yomwe ikufotokoza izo, ndiye kuti sizikuganiziridwa, ziribe kanthu kuti kubwereza kotereku kumapezeka kangati mu fayilo yoponderezedwa.

Njira yanthawi zonse yophatikizira mtanthauzira mawu imakanikizira kubwereza kangapo kwautali waufupi motero imakwaniritsa chiŵerengero chapamwamba chopondereza kuposa kuphatikizika kwa index. Zowona, izi zimatheka chifukwa cha kuchuluka kwa purosesa yapakati; kuti njira ya dikishonale iyambe kukanikiza deta bwino kwambiri kuposa njira yolozera, iyenera kuchepetsa liwiro la data mpaka 10-20 megabytes pamphindikati zenizeni. kukhazikitsa makompyuta okhala ndi katundu wathunthu wa CPU.

Kuthamanga kochepa kotereku sikuvomerezeka kuzinthu zamakono zosungiramo deta ndipo ndizofunika kwambiri "zamaphunziro" kuposa momwe zimakhalira.

Kuchuluka kwa kupsinjika kwa chidziwitso kudzawonjezeka kwambiri pakusinthidwa kotsatira kwa RTT algorithm (RTT-Max), yomwe ikukula kale.

Kotero, monga mwa nthawi zonse, kuti zipitirire ...

Source: www.habr.com

Kuwonjezera ndemanga