Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Lesi sihloko sesivele singesesibili esihlokweni sokucindezelwa kwedatha ngesivinini esikhulu. Isihloko sokuqala sichaze i-compressor esebenza ngesivinini esingu-10 GB/sec. i-processor core ngayinye (ukucindezela okuncane, i-RTT-Min).

Le compressor isivele isetshenziswe emishinini yokuphindaphinda i-forensic yokucindezelwa ngesivinini esiphezulu kwezindawo zokulahla imidiya kanye nokuthuthukisa amandla e-cryptography; ingasetshenziswa futhi ukucindezela izithombe zemishini ebonakalayo kanye namafayela wokushintsha i-RAM lapho uwalondoloza ngesivinini esikhulu. Amadrayivu e-SSD.

Isihloko sokuqala siphinde samemezela ukuthuthukiswa kwe-algorithm yokucindezela yokucindezela amakhophi ayisipele we-HDD ne-SSD disk drive (ukucindezela okumaphakathi, i-RTT-Mid) enemingcele yokucindezela idatha ethuthuke kakhulu. Njengamanje, le compressor isilungele ngokuphelele futhi lesi sihloko simayelana nayo.

I-compressor esebenzisa i-algorithm ye-RTT-Mid inikeza isilinganiso sokuminyanisa esiqhathaniswa nezingobo zomlando ezijwayelekile ezifana ne-WinRar, 7-Zip, esebenza ngemodi yesivinini esikhulu. Ngesikhathi esifanayo, ijubane layo lokusebenza okungenani i-oda lobukhulu obuphakeme.

Ijubane lokupakisha/lokukhipha idatha liwuhlaka olubalulekile olunquma ububanzi bokusetshenziswa kobuchwepheshe bokucindezela. Akunakwenzeka ukuthi noma ubani angacabanga ukucindezela i-terabyte yedatha ngesivinini esingu-10-15 MegaBytes ngomzuzwana (lesi isivinini esifana ncamashi nesokugcina umlando kumodi yokucindezela evamile), ngoba kungathatha cishe amahora angamashumi amabili ngomthwalo ogcwele weprosesa. .

Ngakolunye uhlangothi, i-terabyte efanayo ingakopishwa ngesivinini sokuhleleka kuka-2-3Gigabytes ngomzuzwana cishe emizuzwini eyishumi.

Ngakho-ke, ukucindezelwa kolwazi lwevolumu enkulu kubalulekile uma kwenziwa ngesivinini esingekho ngaphansi kwesivinini sokufaka/okuphumayo kwangempela. Ezinhlelweni zesimanje lokhu okungenani amaMegabytes ayi-100 ngomzuzwana.

Ama-compressor anamuhla angakhiqiza isivinini esinjalo kuphela kumodi "esheshayo". Kukule modi yamanje lapho sizoqhathanisa khona i-algorithm ye-RTT-Mid nama-compressor endabuko.

Ukuhlolwa kokuqhathanisa kwe-algorithm entsha yokuminyanisa

I-RTT-Mid compressor yasebenza njengengxenye yohlelo lokuhlola. Kuhlelo lokusebenza "lokusebenza" lwangempela lusebenza ngokushesha okukhulu, lisebenzisa i-multithreading ngokuhlakanipha futhi lisebenzisa i-compiler "evamile", hhayi i-C#.

Njengoba ama-compressor asetshenziswe ekuhlolweni kokuqhathanisa akhiwe ezimisweni ezahlukene kanye nezinhlobo ezahlukene zedatha yokucindezela ngokuhlukile, ngenjongo yokuhlola, indlela yokulinganisa "isilinganiso sokushisa esibhedlela" yasetshenziswa ...

Ifayela lokulahla le-sector-by-sector lediski enengqondo eneWindows 10 uhlelo lokusebenza lwadalwa; lena ingxube yemvelo yezinhlaka zedatha etholakalayo kuwo wonke amakhompyutha. Ukucindezela leli fayela kuzokuvumela ukuthi uqhathanise isivinini kanye nezinga lokucindezelwa kwe-algorithm entsha nama-compressor athuthuke kakhulu asetshenziswa kumarekhodi esimanje.

Nali ifayela lokulahla:

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Ifayela lokulahla licindezelwe kusetshenziswa i-PTT-Mid, 7-zip, ne-WinRar compressors. I-WinRar ne-7-zip compressor isethwe kusivinini esikhulu.

I-Compressor isebenza I-7-zip:

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Ilayisha iphrosesa ngo-100%, kuyilapho isivinini esimaphakathi sokufunda indawo yokulahla yasekuqaleni singama-MegaBytes angama-60/sec.

I-Compressor isebenza I-Winrar:

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Isimo siyefana, umthwalo weprosesa ucishe ube ngu-100%, isivinini sokufunda sokulahla isilinganiso singamaMegabhayithi angu-125/sec.

Njengasesimeni sangaphambilini, isivinini se-archiver sinqunyelwe amandla omprosesa.

Uhlelo lokuhlola i-compressor manje seluyasebenza I-RTT-Mid:

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Isithombe-skrini sibonisa ukuthi iphrosesa ilayishwe ku-50% futhi ayisebenzi ngaso sonke isikhathi, ngoba akukho ndawo yokulayisha idatha ecindezelwe. Idiski yokulayisha idatha (Idiski 0) isicishe yagcwala. Isivinini sokufunda idatha (Idiski 1) siyahluka kakhulu, kodwa ngokwesilinganiso singaphezu kuka-200 MegaBytes/sec.

Isivinini se-compressor sinqunyelwe kulokhu ngokukwazi ukubhala idatha ecindezelwe ku-Disk 0.

Manje isilinganiso sokucindezela sezingobo zomlando eziwumphumela:

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Kuyabonakala ukuthi i-RTT-Mid compressor yenze umsebenzi ongcono kakhulu wokucindezela; ingobo yomlando eyakhayo yayiyi-1,3 GigaBytes encane kunengobo yomlando ye-WinRar kanye nama-GigaBytes angu-2,1 amancane kunengobo yomlando ye-7z.

Isikhathi esichithwe ekudaleni ingobo yomlando:

  • I-7-zip - imizuzu engama-26 imizuzwana eyi-10;
  • I-WinRar - imizuzu engu-17 imizuzwana engu-40;
  • I-RTT-Mid - imizuzu engu-7 namasekhondi angu-30.

Ngakho-ke, ngisho nohlelo lokuhlola, olungathuthukisiwe, lisebenzisa i-algorithm ye-RTT-Mid, lukwazile ukudala ingobo yomlando ngokushesha okungaphezu kwezikhathi ezimbili nengxenye, kuyilapho ingobo yomlando ibonakale incane kakhulu kuneyeqhudelana nayo...

Labo abangazikholelwa izithombe-skrini bangahlola ubuqiniso bazo ngokwabo. Uhlelo lokuhlola luyatholakala ku- isixhumanisi, landa futhi uhlole.

Kepha kumaphrosesa anokusekelwa kwe-AVX-2 kuphela, ngaphandle kokusekelwa kwale miyalo i-compressor ayisebenzi, futhi ayihloli i-algorithm kumaphrosesa amadala e-AMD, ayanensa ngokulandela imiyalelo ye-AVX...

Indlela yokucindezela esetshenzisiwe

I-algorithm isebenzisa indlela yokukhomba izingcezu zombhalo eziphindaphindiwe ngobumbudumbudu bebhayithi. Le ndlela yokucindezela ibilokhu yaziwa isikhathi eside, kodwa ayizange isetshenziswe ngoba ukusebenza kokufanisa bekubiza kakhulu ngokwezinsiza ezidingekayo futhi kudinga isikhathi esiningi kunokwakha isichazamazwi. Ngakho-ke i-algorithm ye-RTT-Mid iyisibonelo sakudala sokubuyela "emuva kwekusasa"...

I-compressor ye-PTT isebenzisa isithwebuli sokusesha esinesivinini esiphezulu esiyingqayizivele, esisivumela ukuthi sisheshise inqubo yokuminyanisa. Isithwebuli esenziwe mathupha, lokhu "i-charm yami...", "iyabiza kakhulu, ngoba yenziwe ngezandla ngokuphelele" (ibhalwe ku-assembler).

Iskena sosesho sokufanisa senziwa ngokuya ngohlelo olunamazinga amabili angenzeka: okokuqala, ukuba khona β€œkophawu” lomdlalo kuyaskenwa, futhi kuphela ngemva kokuba β€œuphawu” lukhonjiwe kule ndawo, inqubo yokuthola okufanayo kwangempela. kuyaqalwa.

Iwindi lokucinga lokufanisa linosayizi ongaqageleki, kuye ngezinga le-entropy kubhulokhi yedatha ecutshunguliwe. Ngedatha engahleliwe ngokuphelele (engenakucindezelwa) inosayizi wamamegabhayithi, kudatha enezimpinda ihlale inkulu kunemegabhayithi.

Kodwa amafomethi amaningi edatha yesimanje awacindezeleki futhi ukusebenzisa isithwebuli esisebenzisa kakhulu insiza akusizi futhi kuyamosha, ngakho-ke isithwebuli sisebenzisa izindlela ezimbili zokusebenza. Okokuqala, izingxenye zombhalo owumthombo ezinokuphindaphinda okungenzeka ziyaseshwa; lokhu kusebenza futhi kwenziwa kusetshenziswa indlela engenzeka futhi kwenziwa ngokushesha okukhulu (ngesivinini esingu-4-6 GigaBytes/sec). Izindawo ezinokufana okungenzeka zibe sezicutshungulwa yisithwebuli esikhulu.

Ukucindezelwa kwenkomba akusebenzi kahle, kufanele ushintshe izingcezu eziyimpinda ngezinkomba, futhi uhlu lwenkomba lunciphisa kakhulu isilinganiso sokucindezela.

Ukwandisa isilinganiso sokucindezela, akukhona nje kuphela ukufana okuphelele kweyunithi yezinhlamvu ze-byte ezikhonjiwe, kodwa futhi ingxenye, lapho iyunithi yezinhlamvu iqukethe amabhayithi afanisiwe nangenakuqhathaniswa. Ukwenza lokhu, ifomethi yenkomba ihlanganisa inkambu yemaski yomdlalo ekhombisa amabhayithi afanayo wamabhulokhi amabili. Ngokucindezelwa okukhulu nakakhulu, ukukhomba kusetshenziselwa ukubeka ngaphezu kwamabhulokhi amaningana afanayo kancane kubhulokhi yamanje.

Konke lokhu kwenze kwaba nokwenzeka ukuthola ku-PTT-Mid compressor isilinganiso sokuminyanisa esiqhathaniswa nama-compressor enziwe kusetshenziswa indlela yesichazamazwi, kodwa esebenza ngokushesha okukhulu.

Isivinini se-algorithm entsha yokucindezela

Uma i-compressor isebenza ngokusetshenziswa okukhethekile kwememori yenqolobane (amaMegabhayithi angu-4 ayadingeka ngochungechunge ngalunye), isivinini sokusebenza sisuka ku-700-2000 Megabytes/sec. nge-processor core ngayinye, kuye ngohlobo lwedatha ecindezelwayo futhi kuncike kancane kubuningi bokusebenza bephrosesa.

Ngokuqaliswa okunemicu eminingi kwe-compressor, ukukala okusebenzayo kunqunywa usayizi wenqolobane yezinga lesithathu. Isibonelo, ukuba nememori ye-cache engu-9 Megabytes "ebhodini", asikho isidingo sokwethula imicu yokucindezela engaphezu kwemibili; isivinini ngeke sikhuphuke kulokhu. Kodwa ngenqolobane ye-20 Megabytes, usungakwazi kakade ukusebenzisa imicu yokucindezela emihlanu.

Futhi, ukubambezeleka kwe-RAM kuba ipharamitha ebalulekile enquma isivinini se-compressor. I-algorithm isebenzisa ukufinyelela okungahleliwe ku-OP, okunye okungangeni kumemori yenqolobane (cishe i-10%) futhi kufanele ingenzi lutho, ilinde idatha evela ku-OP, enciphisa isivinini sokusebenza.

Kuthinta kakhulu isivinini se-compressor kanye nokusebenza kwesistimu yokufaka/yokukhipha idatha. Izicelo eziya ku-OP ezivela ku-I/O vimba izicelo zedatha evela ku-CPU, ephinde yehlise isivinini sokucindezela. Le nkinga ibalulekile kumakhompyutha aphathekayo namadeskithophu; kumaseva ayibalulekile kangako ngenxa yeyunithi yokulawula ukufinyelela kwebhasi ethuthuke kakhulu kanye ne-RAM eneziteshi eziningi.

Kuwo wonke umbhalo esihlokweni sikhuluma ngokucindezelwa; ukuwohloka kuhlala kungaphandle kobubanzi balesi sihloko ngoba "yonke into imbozwe ngoshokoledi". I-Decompression iyashesha kakhulu futhi inqunyelwe isivinini se-I/O. Umongo owodwa ophathekayo kuchungechunge olulodwa uhlinzeka kalula ngesivinini sokuvula esingu-3-4 GB/sec.

Lokhu kungenxa yokungabikho komsebenzi wokusesha umdlalo ngesikhathi senqubo yokunciphisa, "edla" izinsiza eziyinhloko zeprosesa kanye nememori ye-cache ngesikhathi sokucindezela.

Ukuthembeka kokugcinwa kwedatha okucindezelwe

Njengoba igama lalo lonke ikilasi lesofthiwe elisebenzisa ukucindezelwa kwedatha (abagcini bomlando) lisikisela, zenzelwe ukugcinwa kolwazi lwesikhathi eside, hhayi iminyaka, kodwa amakhulu eminyaka nezinkulungwane zeminyaka...

Ngesikhathi sokulondoloza, imidiya yokugcina ilahlekelwa idatha ethile, nasi isibonelo:

Ukucindezelwa Okuphephile Kokuhluleka Kwejubane Eliphezulu (Kuyaqhubeka)

Lesi sithwali solwazi "se-analog" sineminyaka eyinkulungwane ubudala, ezinye izingcezu zilahlekile, kodwa ngokuvamile ulwazi "luyafundeka"...

Abekho abakhiqizi abanomthwalo wemfanelo bezinhlelo zesimanje zokugcina idatha yedijithali kanye nemidiya yedijithali kubo enikeza iziqinisekiso zokuphepha okuphelele kwedatha iminyaka engaphezu kwengama-75.
Futhi lokhu kuyinkinga, kodwa inkinga ehlehlisiwe, inzalo yethu izoyixazulula...

Izinhlelo zokugcinwa kwedatha yedijithali zingalahlekelwa idatha hhayi kuphela ngemva kweminyaka engu-75, amaphutha kudatha angavela nganoma yisiphi isikhathi, ngisho nangesikhathi sokurekhoda kwawo, azama ukunciphisa lokhu kuhlanekezela ngokusebenzisa ukuphindaphinda nokulungisa ngezinhlelo zokulungisa amaphutha. I-redundancy kanye nezinhlelo zokulungisa azikwazi ukubuyisela ulwazi olulahlekile njalo, futhi uma kwenzeka, asikho isiqinisekiso sokuthi umsebenzi wokubuyisela uqedwe ngendlela efanele.

Futhi lokhu kuyinkinga enkulu, kodwa hhayi ehlehlisiwe, kodwa eyamanje.

Ama-compressor anamuhla asetshenziselwa ukugcina idatha yedijithali yakhelwe ekulungisweni okuhlukahlukene kwendlela yesichazamazwi, futhi kulezo zingobo zomlando ukulahlekelwa ucezu lolwazi kuyoba isenzakalo esibulalayo; kukhona ngisho negama elimisiwe lesimo esinjalo - inqolobane "ephukile". ...

Ukuthembeka okuphansi kokugcina ulwazi kungobo yomlando ngokucindezelwa kwesichazamazwi kuhlotshaniswa nesakhiwo sedatha ecindezelwe. Ulwazi olukungobo yomlando enjalo alunawo umbhalo womthombo, izinombolo zokufakwa kusichazamazwi zigcinwa lapho, futhi isichazamazwi ngokwaso sishintshwa ngokushintshayo umbhalo wamanje ocindezelwe. Uma ucezu lwengobo yomlando lulahlekile noma lonakalisiwe, konke okufakiwe okulandelayo akukwazi ukukhonjwa okuqukethwe noma ngobude bokufakwa kusichazamazwi, njengoba kungacaci ukuthi inombolo yokufaka isichazamazwi ihambisana nani.

Akunakwenzeka ukubuyisela ulwazi oluvela kungobo yomlando "ephukile".

I-algorithm ye-RTT isuselwe endleleni ethembeke kakhulu yokugcina idatha ecindezelwe. Isebenzisa indlela yenkomba yokubala ukuze iphindaphinde izingcezwana. Le ndlela yokucindezela ikuvumela ukuba unciphise imiphumela yokuhlanekezelwa kolwazi endaweni yokugcina, futhi ezimweni eziningi ukulungisa ngokuzenzakalelayo ukuhlanekezela okwavela ngesikhathi sokugcinwa kolwazi.
Lokhu kungenxa yokuthi ifayela lengobo yomlando esimweni sokucindezelwa kwenkomba liqukethe izinkambu ezimbili:

  • inkambu yombhalo womthombo enezigaba eziphindayo ezikhishiwe kuyo;
  • inkambu yenkomba.

Inkambu yenkomba, ebaluleke kakhulu ekubuyiseleni ulwazi, ayinkulu ngosayizi futhi ingaphindaphindwa ukuze kugcinwe idatha okuthembekile. Ngakho-ke, noma ngabe ucezu lombhalo womthombo noma uhlu lwenkomba lulahlekile, lonke olunye ulwazi luzobuyiselwa ngaphandle kwezinkinga, njengasesithombeni esinendawo yokugcina "i-analog".

Ukungalungi kwe-algorithm

Azikho izinzuzo ngaphandle kokubi. Indlela yokucindezela inkomba ayicindezeli ukulandelana okumfushane okuphindayo. Lokhu kungenxa yemikhawulo yendlela yenkomba. Izinkomba okungenani zingu-3 byte ngosayizi futhi zingafika kumabhayithi angu-12 ngosayizi. Uma impinda ihlangabezana nosayizi omncane kunenkomba eyichazayo, ngakho-ke akunakwa, kungakhathaliseki ukuthi ukuphindaphinda okunjalo kutholwa kangaki kufayela elicindezelwe.

Indlela evamile yokucindezela isichazamazwi icindezela ngempumelelo ukuphindaphinda okuningi kobude obufushane futhi ngenxa yalokho ifinyelela isilinganiso sokucindezela esiphezulu kunokuminyanisa inkomba. Yiqiniso, lokhu kufinyelelwa ngenxa yomthwalo omkhulu kuphrosesa emaphakathi; ukuze indlela yesichazamazwi iqale ukucindezela idatha ngokuphumelelayo kunendlela yenkomba, kufanele yehlise isivinini sokucutshungulwa kwedatha ibe yi-10-20 megabytes ngomzuzwana empeleni. ukufaka ikhompuyutha ngomthwalo ogcwele we-CPU.

Izivinini ezinjalo eziphansi azamukelekile ezinhlelweni zesimanje zokugcina idatha futhi zinentshisekelo β€œyezemfundo” kunokusebenza.

Izinga lokucindezelwa kolwazi lizokwenyuka kakhulu ekuguqulweni okulandelayo kwe-algorithm ye-RTT (RTT-Max), eseyathuthukiswa kakade.

Ngakho, njengenhlalayenza, kusazoqhubeka...

Source: www.habr.com

Engeza amazwana