Game da wata hanya mai ban mamaki don adana sararin faifai

Wani mai amfani yana so ya rubuta sabon yanki na bayanai zuwa rumbun kwamfutarka, amma bashi da isasshen sarari kyauta don yin wannan. Ba na so in share wani abu, tun da "komai yana da mahimmanci kuma yana da mahimmanci." Kuma me ya kamata mu yi da shi?

Babu wanda ke da wannan matsalar. Akwai terabytes na bayanai akan rumbun kwamfutarka, kuma wannan adadin ba ya son raguwa. Amma yaya musamman yake? A ƙarshe, duk fayiloli kawai saitin rago ne na takamaiman tsayi kuma, mai yuwuwa, sabon bai bambanta da wanda aka riga aka adana ba.

A bayyane yake cewa neman guntun bayanan da aka riga aka adana akan rumbun kwamfutarka shine, idan ba gazawa ba, to aƙalla ba aiki mai tasiri bane. A daya bangaren, idan bambancin ya yi kadan, to, za ku iya daidaita shi kadan ...

Game da wata hanya mai ban mamaki don adana sararin faifai

TL; DR - ƙoƙari na biyu don yin magana game da wata hanya mai ban mamaki na inganta bayanai ta amfani da fayilolin JPEG, yanzu a cikin hanyar da za a iya fahimta.

Game da ragowa da bambanci

Idan ka ɗauki guda biyu gaba ɗaya bazuwar bayanai, to a matsakaicin rabin ragowar da suke ɗauke da su sun zo daidai. Lalle ne, a cikin yiwuwar shimfidu ga kowane nau'i-nau'i ('00, 01, 10, 11"), daidai rabin suna da dabi'u iri ɗaya, komai yana da sauƙi a nan.

Amma ba shakka, idan muka ɗauki fayiloli biyu kawai muka daidaita ɗaya zuwa na biyu, to za mu rasa ɗaya daga cikinsu. Idan muka ajiye canje-canje, za mu sake ƙirƙira kawai delta codeing, wanda ya wanzu daidai ba tare da mu ba, kodayake ba a saba amfani da shi don dalilai iri ɗaya ba. Za mu iya ƙoƙarin shigar da ƙaramin jeri zuwa mafi girma, amma duk da haka muna haɗarin rasa sassan bayanai masu mahimmanci idan muka yi amfani da shi ba tare da kulawa da komai ba.

Tsakanin me kuma menene za'a iya kawar da bambanci? To, wato, sabon fayil ɗin da mai amfani ya rubuta shi ne kawai jerin raƙuman ruwa, wanda ba za mu iya yin wani abu da kansa ba. Sa'an nan kuma kawai kuna buƙatar nemo irin waɗannan raƙuman ruwa akan rumbun kwamfyuta wanda za'a iya canza su ba tare da adana bambance-bambancen ba, don ku iya tsira daga asarar su ba tare da sakamako mai tsanani ba. Kuma yana da ma'ana don canza ba kawai fayil ɗin akan FS kanta ba, amma wasu bayanan da ba su da mahimmanci a ciki. Amma wanne kuma ta yaya?

Hanyoyin dacewa

Fayilolin da aka matsa masu hasara suna zuwa ceto. Duk waɗannan jpegs, mp3s da sauran su, kodayake matsi na rashin ƙarfi, sun ƙunshi gungu na rago waɗanda za a iya canza su cikin aminci. Yana yiwuwa a yi amfani da ci-gaba dabaru waɗanda ba tare da fahimta ba suna canza abubuwan su a matakai daban-daban na ɓoyewa. Jira Na'urori masu tasowa ... gyare-gyaren da ba a iya fahimta ba ... daya bit cikin wani ... yana da kusan kamar steganography!

Lallai, shigar da wani bayani cikin wani yana tunawa da hanyoyinta kamar ba komai ba. Har ila yau, rashin fahimtar sauye-sauyen da ake yi a jikin mutum ya burge ni. Inda hanyoyin suka bambanta suna cikin sirri: aikinmu yana zuwa ga mai amfani yana shigar da ƙarin bayani akan rumbun kwamfutarka; zai cutar da shi kawai. Zai sake mantawa.

Saboda haka, ko da yake za mu iya amfani da su, muna buƙatar yin wasu gyare-gyare. Sannan zan gaya musu kuma in nuna su ta amfani da misalin ɗayan hanyoyin da ake da su da tsarin fayil gama gari.

Game da jackals

Idan ka matse shi da gaske, shi ne abu mafi matsewa a duniya. Muna, ba shakka, muna magana ne game da fayilolin JPEG. Ba wai kawai akwai tarin kayan aiki da hanyoyin da ake da su don shigar da bayanai a ciki ba, amma shine mafi shaharar tsarin zane a wannan duniyar.

Game da wata hanya mai ban mamaki don adana sararin faifai

Koyaya, don kar ku shiga cikin kiwo na kare, kuna buƙatar iyakance filin ayyukanku a cikin fayilolin wannan tsari. Babu wanda ke son murabba'in monochrome wanda ya bayyana saboda matsananciyar matsawa, don haka kuna buƙatar iyakance kanku don yin aiki tare da fayil ɗin da aka riga aka matsa, gujewa recoding. Musamman ma, tare da ƙididdiga masu ƙididdiga, waɗanda ke kasancewa bayan ayyukan da ke da alhakin asarar bayanai - DCT da ƙididdigewa, wanda aka nuna daidai a cikin tsarin ɓoye (godiya ga wiki na Bauman National Library):
Game da wata hanya mai ban mamaki don adana sararin faifai

Akwai hanyoyi da yawa masu yuwuwa don inganta fayilolin jpeg. Akwai ingantawa mara asara (jpegtran), akwai ingantawa "babu asara", wanda a zahiri yana ba da gudummawar wani abu dabam, amma ba mu damu da su ba. Bayan haka, idan mai amfani ya shirya don shigar da bayanai ɗaya zuwa wani don haɓaka sararin faifai kyauta, to ko dai ya inganta hotunansa tuntuni, ko kuma ba ya son yin hakan kwata-kwata don tsoron rasa ingancinsu.

F5

Dukan dangin algorithms sun dace da waɗannan sharuɗɗan, waɗanda zaku iya sanin kanku da su a cikin wannan kyakkyawar gabatarwa. Mafi ci gaba daga cikinsu shine algorithm F5 na Andreas Westfeld, yana aiki tare da ƙididdiga na ɓangaren haske, tun da idon ɗan adam ya fi dacewa da canje-canje. Bugu da ƙari, yana amfani da dabarar haɗawa bisa tushen matrix, wanda ke ba da damar yin ƴan canje-canje yayin shigar da adadin bayanai iri ɗaya, girman girman kwandon da aka yi amfani da shi.

Canje-canjen da kansu suna tafasa ƙasa don rage cikakkiyar ƙimar ƙididdiga ta ɗaya ƙarƙashin wasu sharuɗɗa (wato, ba koyaushe ba), wanda ke ba ku damar amfani da F5 don inganta ma'ajin bayanai akan rumbun kwamfutarka. Ma'anar ita ce ƙididdigewa bayan irin wannan canjin zai fi dacewa ya mamaye ƴan kaɗan bayan Huffman codeing saboda ƙididdigar ƙididdiga na dabi'u a cikin JPEG, kuma sabbin sifilai za su ba da riba yayin shigar da su ta amfani da RLE.

Canje-canjen da ake buƙata sun taso don kawar da ɓangaren da ke da alhakin ɓoyewa (sake tsara kalmar sirri), wanda ke adana albarkatu da lokacin aiwatarwa, da ƙara hanyar aiki tare da fayiloli da yawa maimakon ɗaya bayan ɗaya. Mai karatu ba shi yiwuwa ya yi sha'awar tsarin canji daki-daki, don haka bari mu ci gaba zuwa bayanin aiwatarwa.

Babban fasaha

Don nuna yadda wannan hanyar ke aiki, na aiwatar da hanyar a cikin tsarkakkiyar C kuma na aiwatar da yawan haɓakawa duka biyu dangane da saurin aiwatarwa da ƙwaƙwalwar ajiya (ba za ku iya tunanin yawan nauyin waɗannan hotuna ba tare da matsawa ba, tun kafin DCT). Giciye-dandamali da aka samu ta amfani da haɗin ɗakin karatu libjpeg, pcre и tindari, wanda muke gode musu. Duk waɗannan an haɗa su ta hanyar 'make', don haka masu amfani da Windows suna son shigar da wasu Cygwin don kansu don kimantawa, ko yin hulɗa da Visual Studio da ɗakunan karatu da kansu.

Ana samun aiwatarwa ta hanyar kayan aikin wasan bidiyo da ɗakin karatu. Masu sha'awar za su iya samun ƙarin bayani game da amfani da ƙarshen a cikin readme a cikin ma'ajin Github, hanyar haɗin da zan haɗa a ƙarshen post.

Yaya za a yi amfani da su?

A hankali. Hotunan da aka yi amfani da su don tattarawa an zaɓi su ta hanyar bincike ta amfani da magana ta yau da kullum a cikin tushen tushen da aka ba. Bayan kammalawa, ana iya matsar da fayiloli, sake suna da kwafi yadda ake so a cikin iyakokin sa, canza fayil da tsarin aiki, da sauransu. Duk da haka, ya kamata ku yi taka tsantsan kuma kada ku canza abun ciki nan da nan ta kowace hanya. Rasa darajar ko da bit guda na iya sa ba zai yiwu a dawo da bayanai ba.

Bayan kammalawa, mai amfani yana barin fayil ɗin ajiya na musamman wanda ya ƙunshi duk bayanan da ake buƙata don buɗewa, gami da bayanai game da hotunan da aka yi amfani da su. Da kanta, yana auna kusan kilobytes biyu kuma baya da wani tasiri mai mahimmanci akan sararin faifai da aka mamaye.

Kuna iya yin nazarin yuwuwar iya aiki ta amfani da tutar '-a':'./f5ar -a [babban fayil ɗin bincike] [Mai jituwa na yau da kullun na Perl]'. Ana yin shiryawa tare da umarnin './f5ar -p [babban fayil ɗin bincike] [Mai jituwa na yau da kullun na Perl] [fayil ɗin da aka cika] [sunan tarihin]', da buɗewa tare da './f5ar -u [fayil ɗin adana bayanai] [sunan fayil da aka dawo ]' .

Nuna aikin

Don nuna tasirin hanyar, na ɗora tarin hotunan karnuka 225 na kyauta daga sabis ɗin. Unsplash kuma an samo a cikin takaddun babban pdf na mita 45 na juzu'i na biyu Fasahar Shirye-shirye Kuta.

Jerin yana da sauqi:

$ du -sh knuth.pdf dogs/
44M knuth.pdf
633M dogs/

$ ./f5ar -p dogs/ .*jpg knuth.pdf dogs.f5ar
Reading compressing file... ok
Initializing the archive... ok
Analysing library capacity... done in 17.0s
Detected somewhat guaranteed capacity of 48439359 bytes
Detected possible capacity of upto 102618787 bytes
Compressing... done in 39.4s
Saving the archive... ok

$ ./f5ar -u dogs/dogs.f5ar knuth_unpacked.pdf
Initializing the archive... ok
Reading the archive file... ok
Filling the archive with files... done in 1.4s
Decompressing... done in 21.0s
Writing extracted data... ok

$ sha1sum knuth.pdf knuth_unpacked.pdf
5bd1f496d2e45e382f33959eae5ab15da12cd666 knuth.pdf
5bd1f496d2e45e382f33959eae5ab15da12cd666 knuth_unpacked.pdf

$ du -sh dogs/
551M dogs/

Hotunan hotuna don magoya baya

Game da wata hanya mai ban mamaki don adana sararin faifai

Fayil ɗin da ba a tattara ba yana iya kuma yakamata a karanta shi:

Game da wata hanya mai ban mamaki don adana sararin faifai

Kamar yadda kake gani, daga ainihin 633 + 36 == 669 megabytes na bayanai akan rumbun kwamfutarka, mun zo mafi dadi 551. Irin wannan bambance-bambancen ra'ayi yana bayyana ta hanyar raguwar ƙimar ƙima, wanda ke shafar su. Matsi mara asara na gaba: rage ɗaya bayan ɗaya zai iya “yanke bytes biyu cikin sauƙi daga fayil ɗin ƙarshe. Duk da haka, wannan har yanzu asarar bayanai ne, ko da yake yana da ƙananan ƙananan, wanda za ku iya jurewa.

Abin farin ciki, ba su da cikakkiyar ganuwa ga ido. A ƙarƙashin mai ɓarna (tunda habrastorage ba zai iya ɗaukar manyan fayiloli ba), mai karatu na iya kimanta bambance-bambancen duka ta ido da ƙarfin su, ana samun su ta hanyar cire ƙimar abubuwan da aka canza daga asali: na asali, tare da bayani a ciki, bambanci (mafi ƙarancin launi, ƙaramin bambanci a cikin toshe).

Maimakon a ƙarshe

Yin la'akari da duk waɗannan matsalolin, siyan rumbun kwamfutarka ko loda komai zuwa gajimare na iya zama kamar mafita mafi sauƙi ga matsalar. Amma ko da yake muna rayuwa a cikin irin wannan lokaci mai ban sha'awa a yanzu, babu tabbacin cewa gobe zai yiwu a shiga kan layi da loda duk ƙarin bayanan ku a wani wuri. Ko ka je kantin sayar da kaya ka siyo wa kanka wani rumbun terabyte dubu. Amma koyaushe kuna iya amfani da gidajen da ke akwai.

-> GitHub

source: www.habr.com

Add a comment