High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Ichi chinyorwa chatova chechipiri mumusoro wepamusoro-speed data compression. Chinyorwa chekutanga chakatsanangura compressor inoshanda nekumhanya kwe10 GB/sec. per processor core (minimum compression, RTT-Min).

Iyi compressor yakatoitwa mumidziyo ye forensic duplicators yekumhanyisa-kumhanya kwekuchengetedza midhiya marasi uye kuwedzera simba recryptography; inogona zvakare kushandiswa kudzvanya mapikicha emakina chaiwo uye RAM swap mafaera paunenge uchiachengeta pane yakakwirira-kumhanya. SSD inotyaira.

Chinyorwa chekutanga chakazivisawo kuvandudzwa kweiyo compression algorithm yekumanikidza backup makopi eHDD uye SSD dhisiki madhiraivha (yepakati compression, RTT-Mid) ine yakanyanya kunatsiridza data compression paramita. Parizvino, iyi compressor yakagadzirira zvakakwana uye chinyorwa ichi chiri pamusoro pazvo.

Compressor inoshandisa iyo RTT-Mid algorithm inopa compression reshiyo inofananidzwa neyakajairwa archives seWinRar, 7-Zip, inoshanda mu-high-speed mode. Panguva imwecheteyo, kumhanya kwayo kwekushanda kunenge kurongeka kwehukuru hwepamusoro.

Iko kumhanya kwekurongedza / kuburitsa data ndiyo yakakosha parameter inotaridza chiyero chekushandiswa kwema compression matekinoroji. Hazvigoneke kuti chero munhu angafunga kudzvanya terabyte yedata nekumhanya kwe10-15 MegaBytes pasekondi (iyi ndiyo chaiyo kumhanya kwearchives mune yakajairwa compression mode), nekuti zvinotora angangoita maawa makumi maviri ine yakazara processor load. .

Kune rumwe rutivi, iyo terabyte imwe chete inogona kukopwa nekumhanya kweiyo 2-3Gigabytes pasekondi mukati memaminitsi gumi.

Naizvozvo, kudzvanywa kwemashoko makuru-vhoriyamu kwakakosha kana ichiitwa nekumhanya kusiri kuderera kupfuura kumhanya kweiyo chaiyo yekupinza / kubuda. Kune masisitimu emazuva ano izvi zvinokwana 100 Megabytes pasekondi.

Macompressor emazuva ano anogona kuburitsa kumhanya kwakadaro chete mu "fast" mode. Iri mune ino yazvino modhi yatichaenzanisa iyo RTT-Mid algorithm neyechinyakare compressors.

Kuenzanisa kuyedzwa kweiyo itsva compression algorithm

Iyo RTT-Mid compressor yakashanda sechikamu chechirongwa chekuyedza. Mune chaiyo "kushanda" application inoshanda nekukurumidza, inoshandisa multithreading nehungwaru uye inoshandisa "yakajairika" compiler, kwete C #.

Sezvo ma compressor anoshandiswa muyedzo yekuenzanisa akavakirwa pamisimboti akasiyana uye akasiyana marudzi e data compress zvakasiyana, kune chinangwa chebvunzo, nzira yekuyera "avhareji tembiricha muchipatara" yakashandiswa ...

A sector-by-sector dump file ye logical disk ine Windows 10 sisitimu yekushandisa yakagadzirwa; uyu ndiwo musanganiswa wakasarudzika weakasiyana dhata zvimiro zviripo pakombuta yega yega. Kudzvanya iyi faira kuchakubvumidza kuti uenzanise kumhanya uye dhigirii rekudzvanya kweiyo algorithm itsva neakanyanya epamusoro compressor anoshandiswa mumatura emazuva ano.

Heino dump file:

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Iyo faira yekurasa yakamanikidzwa uchishandisa PTT-Mid, 7-zip, uye WinRar compressors. Iyo WinRar uye 7-zip compressor yakaiswa kune yakanyanya kumhanya.

Compressor ichimhanya 7-zip:

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Inotakura processor ne100%, nepo avhareji yekumhanya yekuverenga yekurasa yepakutanga ingangoita 60 MegaBytes/sec.

Compressor ichimhanya Winrar:

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Mamiriro acho akafanana, iyo processor mutoro ingangoita 100%, avhareji yekurasa kuverenga kumhanya inenge 125 Megabytes/sec.

Sezvakaitika mune yakapfuura, kumhanya kweiyo archiver kunogumira nekugona kwe processor.

Iyo compressor test chirongwa chave kushanda RTT-Mid:

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Iyo skrini inoratidza kuti processor inoremerwa pa50% uye haina basa nguva yese, nekuti hapana pese pakuisa iyo yakamanikidzwa data. Iyo data upload dhisiki (Disk 0) inenge yazara yakazara. Iyo data kuverenga kumhanya (Disk 1) inosiyana zvakanyanya, asi paavhareji inopfuura 200 MegaBytes/sec.

Iko kumhanya kwecompressor kunogumira mune iyi kesi nekugona kunyora yakamanikidzwa data kuDisk 0.

Ikozvino iyo compression ratio yezvakaguma zvakachengetwa:

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Zvinogona kuonekwa kuti iyo RTT-Mid compressor yakaita basa rakanakisa rekumanikidza; iyo archive yayakagadzira yaive 1,3 GigaBytes idiki pane WinRar archive uye 2,1 GigaBytes idiki pane 7z archive.

Nguva yakashandiswa kugadzira dura:

  • 7-zip - 26 maminitsi 10 masekondi;
  • WinRar - 17 maminitsi 40 masekondi;
  • RTT-Mid - 7 maminitsi 30 masekondi.

Nekudaro, kunyangwe bvunzo, isina-yakagadziridzwa chirongwa, ichishandisa RTT-Mid algorithm, yakakwanisa kugadzira archive kanopfuura kaviri nehafu nekukurumidza, nepo chengetedzo yakave idiki zvakanyanya pane yevakwikwidzi vayo ...

Avo vasingatendi mascreenshots vanogona kutarisa huchokwadi hwavo ivo pachavo. Chirongwa chekuyedza chinowanikwa pa batanidzo, dhawunirodha uye tarisa.

Asi chete pama processor ane AVX-2 tsigiro, pasina tsigiro yemirairo iyi compressor haishande, uye usayedze algorithm pane yekare AMD processors, ivo vanononoka maererano nekuita mirairo yeAVX ...

Compression nzira inoshandiswa

Iyo algorithm inoshandisa nzira yekunongedza akadzokororwa zvinyorwa zvinyorwa mubyte granularity. Iyi nzira yekutsikirira yave ichizivikanwa kwenguva yakareba, asi haina kushandiswa nekuti kuenzanisa kushanda kwaidhura zvakanyanya maererano nezviwanikwa zvinodiwa uye yaida nguva yakawanda kupfuura kuvaka duramazwi. Saka iyo RTT-Mid algorithm muenzaniso wekare wekufamba "kudzokera kune ramangwana"...

Iyo PTT compressor inoshandisa yakasarudzika yakakwirira-kumhanyisa match yekutsvaga scanner, iyo inotibvumira kukurumidzira maitiro ekumanikidza. A self-made scanner, iyi "runako rwangu ...", "inodhura zvikuru, nokuti yakagadzirwa nemaoko zvachose" (yakanyorwa muassembler).

Iyo match search scanner inogadzirwa zvinoenderana neaviri-level probabilistic scheme: kutanga, kuvepo kwe "chiratidzo" chemutambo kunoongororwa, uye chete mushure mekunge "chiratidzo" chaonekwa munzvimbo ino, maitiro ekuona mutambo chaiwo. kwakatangwa.

Iwindo rekutsvaga remachisi rine saizi isinga fungidzike, zvichienderana nehuwandu hwe entropy mune yakagadziriswa data block. Kune zvachose random (incompressible) data ine saizi ye megabytes, yedata ine kudzokorora inogara yakakura kupfuura megabyte.

Asi akawanda emazuva ano mafomati edata haagoneke uye kumhanyisa resource-intensive scanner kuburikidza navo hakuna basa uye kutambisa, saka scanner inoshandisa maviri maitiro ekushandisa. Chekutanga, zvikamu zvezvinyorwa zvekwakabva zvine kudzokororwa kunobvira zvinotsvakwa; oparesheni iyi inoitwa pachishandiswa nzira yeprobabilistic uye inoitwa nekukasika (pakumhanya kwe4-6 GigaBytes/sec). Nzvimbo dzine machisi anobvira dzinozogadziriswa ne main scanner.

Index compression haina kunyatso shanda, iwe unofanirwa kutsiva zvidimbu zvimedu nema indices, uye index array inoderedza zvakanyanya kuwanda kwereshiyo.

Kuti uwedzere chiyero chekumanikidza, kwete chete machisi akazara ebhayiti tambo akaiswa indexed, asiwo chidimbu, kana tambo ine mabheti akafananidzwa uye asina kufananidzwa. Kuti uite izvi, iyo index fomati inosanganisira machisi mask munda unoratidza kuenzanisa mabheti emabhuroko maviri. Kune zvakatonyanya kudzvanya, indexing inoshandiswa kupfuudza akati wandei anoenderana zvidhinha pane yazvino block.

Zvese izvi zvakaita kuti zvikwanise kuwana muPTT-Mid compressor reshiyo yekumanikidza inofananidzwa nemacompressor akaitwa uchishandisa nzira yeduramazwi, asi achishanda nekukurumidza.

Kumhanya kweiyo itsva compression algorithm

Kana iyo compressor ichishanda nekusarudzika kushandiswa kwecache memory (4 Megabytes inodiwa pa thread), ipapo kumhanya kwekushanda kunotangira pa700-2000 Megabytes/sec. per processor core, zvichienderana nerudzi rwe data iri kudzvanywa uye zvinoenderana nediki pane yekushanda frequency ye processor.

Nekuitwa kwakawanda-kwakarukwa kweiyo compressor, scalability inoshanda inotarwa nehukuru hwechitatu nhanho cache. Semuenzaniso, kuve ne9 MegaBytes yecache memory "pabhodhi", hapana chikonzero chekutangisa anopfuura maviri tambo yekumanikidza; kumhanya hakuzowedze kubva pane izvi. Asi necache ye20 Megabytes, unogona kutomhanya shanu tambo dzekumanikidza.

Zvakare, iyo latency ye RAM inova yakakosha parameter inotaridza kumhanya kwecompressor. Iyo algorithm inoshandisa kungoerekana yasvika kuOP, mamwe ayo asingapinde mu cache memory (anenge 10%) uye inofanirwa kusaita, kumirira data kubva kuOP, iyo inoderedza kumhanya kwekushanda.

Zvinonyanya kukanganisa kumhanya kwecompressor uye kushanda kweiyo data yekuisa / kubuda system. Zvikumbiro kune OP kubva kuI / O block zvikumbiro zve data kubva kuCPU, izvo zvakare zvinoderedza kumhanyisa kumhanya. Dambudziko iri rakakosha kumalaptops uye desktops; kumaseva harina kukosha nekuda kweiyo yepamusoro soro system yebhazi yekuwana control unit uye akawanda-channel RAM.

Muzvinyorwa zvese muchinyorwa tinotaura nezve compression; decompression inoramba iri kunze kwechikamu chechinyorwa ichi sezvo "zvese zvakafukidzwa muchokoreti". Decompression inokurumidza kukurumidza uye inoganhurwa neI/O kumhanya. Imwe yemuviri musimboti mune imwe shinda nyore inopa kuburitsa kumhanya kwe3-4 GB/sec.

Izvi zvinokonzerwa nekushayikwa kwekutsvaga kwekutsvaga kwekutsvaga panguva yekugadzirisa decompression, iyo "inodya" iyo huru zviwanikwa zve processor uye cache memory panguva yekumanikidza.

Kuvimbika kwekumanikidzwa kuchengetedza data

Sezvo zita rekirasi yese yesoftware inoshandisa kudzvanya data (archivers) inoratidzira, dzakagadzirirwa kuchengetwa kwenguva refu kweruzivo, kwete kwemakore, asi kwemazana emakore nemamireniyamu...

Panguva yekuchengetedza, midhiya yekuchengetedza inorasikirwa neimwe data, heino muenzaniso:

High-Speed ​​​​Fail-Safe Compression (Inoenderera mberi)

Iyi "analog" inotakura ruzivo ine chiuru chemakore, zvimwe zvimedu zvakarasika, asi kazhinji ruzivo "runoverengwa" ...

Hapana wevagadziri vane basa remazuva ano edhijitari data kuchengetedza masisitimu uye midhiya yedhijitari kwavari inopa vimbiso yekuchengetedzwa kwakazara kwedata kweanopfuura makore makumi manomwe neshanu.
Uye iri idambudziko, asi dambudziko rakamisikidzwa, vazukuru vedu vacharigadzirisa...

Digital data kuchengetedza masisitimu anogona kurasikirwa nedata kwete chete mushure memakore makumi manomwe neshanu, zvikanganiso mudata zvinogona kuoneka chero nguva, kunyangwe panguva yekurekodha kwavo, vanoedza kudzikisa kukanganisa uku nekushandisa redundancy uye kururamisa nemasisitimu ekugadzirisa kukanganisa. Redundancy uye magadzirirwo masisitimu haagone kudzoreredza ruzivo rwakarasika nguva dzose, uye kana vakadaro, hapana vimbiso yekuti basa rekudzosera rakapedzwa nemazvo.

Uye iri zvakare idambudziko hombe, asi kwete rakamisikidzwa, asi razvino.

Macompressor emazuva ano anoshandiswa kuchengetedza data redhijitari akavakirwa pane akasiyana magadzirirwo enzira yeduramazwi, uye kune matura akadaro kurasikirwa nechidimbu cheruzivo chichava chiitiko chinouraya; pane kunyange izwi rakatemerwa remamiriro ezvinhu akadaro - "yakaputsika" dura. ...

Kudzikira kwekuvimbika kwekuchengeta ruzivo mudura nekutsikirirwa kweduramazwi kune chekuita nechimiro chedata rakadzvanywa. Ruzivo rwuri mudura rakadaro haruna magwaro ekwakabva, nhamba dzezvakanyorwa muduramazwi dzinochengetwa imomo, uye duramazwi pacharo rinogadziridzwa zvine simba nemavara akadzvanywa aripo. Kana chimedu chemudura chikarasika kana kushatiswa, zvese zvinotevera mudura hazvigone kuzivikanwa nezvirimo kana nehurefu hwezvakanyorwa muduramazwi, sezvo zvisiri pachena kuti nhamba yekupinda muduramazwi inopindirana nei.

Hazvibviri kudzorera ruzivo kubva kune yakadaro "yakaputsika" archive.

Iyo RTT algorithm yakavakirwa pane imwe nzira yakavimbika yekuchengetedza data yakamanikidzwa. Inoshandisa indekisi nzira yekuverenga kudzokorora zvidimbu. Iyi nzira yekumanikidza inokubvumira kuti uderedze mhedzisiro yekukanganiswa kweruzivo pane yekuchengetera svikiro, uye kazhinji kazhinji inogadzirisa kukanganisa kwakamuka panguva yekuchengetedza ruzivo.
Izvi zvinokonzerwa nekuti iyo archive faira munyaya yekumanikidza index ine minda miviri:

  • a source text field ine kudzokorora zvikamu zvakabviswa pairi;
  • index field.

Iyo index field, iyo yakakosha pakudzoreredza ruzivo, haina kukura muhukuru uye inogona kudzokororwa kune yakavimbika data yekuchengetedza. Nokudaro, kunyange kana chidimbu chezvinyorwa zvezvinyorwa kana index index zvakarasika, mamwe mashoko ose achadzorerwa pasina matambudziko, sepamufananidzo ne "analog" yekuchengetedza nzira.

Kuipa kweiyo algorithm

Hapana mabhenefiti asina madhiri. Iyo index compression nzira haidzvanyiriri mapfupi anodzokorora akatevedzana. Izvi zvinokonzerwa nekugumira kweiyo index nzira. Ma indexes anokwana 3 bytes muhukuru uye anogona kusvika 12 bytes muhukuru. Kana kudzokorora kukasangana nehukuru hudiki pane iyo index inotsanangura iyo, saka haina kuverengerwa, zvisinei kuti kangani kudzokorora kwakadaro kunoonekwa mune yakamanikidzwa faira.

Iyo yechinyakare yekutsikirira duramazwi nzira inonyatso kudzvanya kudzokorora kwakawanda kwehurefu hupfupi uye nekudaro inowana yakakwira yekumanikidza reshiyo pane kudzvanya kwe index. Chokwadi, izvi zvinowanikwa nekuda kwekuremerwa kwepakati processor; kuti nzira yeduramazwi itange kudzvanya data zvakanyanya kupfuura nzira ye index, inofanirwa kudzikisa kasi yekugadzira data kusvika 10-20 megabytes pasekondi chaiyo. kuisirwa komputa ine yakazara CPU mutoro.

Kumhanya kwakaderera kwakadaro hakugamuchirwe kune masisitimu emazuva ano ekuchengetera dhata uye kune akanyanya kufarira "mudzidzi" pane anoshanda.

Iyo dhigirii yekutsikirira yeruzivo ichawedzerwa zvakanyanya mukugadziriswa kunotevera kweRTT algorithm (RTT-Max), iyo yatove mukuvandudzwa.

Saka, senguva dzose, kuenderera mberi ...

Source: www.habr.com

Voeg