Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha

Ukufumana ukuxhomekeka okusebenzayo kwidatha kusetyenziswa kwiindawo ezahlukeneyo zokuhlalutya idatha: ulawulo lwedatha, ukucocwa kwedatha, ubunjineli be-database reverse kunye nokuhlola idatha. Sele sipapashe malunga nabaxhomekeke ngokwabo inqaku UAnastasia Birillo kunye noNikita Bobrov. Ngeli xesha, u-Anastasia, ophumelele kwiZiko leNzululwazi yeKhompyutha kulo nyaka, wabelana ngophuhliso lwalo msebenzi njengenxalenye yomsebenzi wophando owawukhusela kwiziko.

Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha

Ukukhetha umsebenzi

Ngelixa ndifunda kwiziko le-CS, ndaqala ukufunda ngoovimba beenkcukacha nzulu, oko kukuthi, ukukhangela ukuxhomekeka kokusebenza kunye nokwahluka. Esi sihloko sasinxulumene nesihloko somsebenzi wam eyunivesithi, ngoko ngelixa ndisebenza kwikhosi, ndaqala ukufunda amanqaku amalunga nokuxhomekeka okuhlukeneyo kwiinkcukacha. Ndibhale uphononongo lwale ndawo-enye yam yokuqala amanqaku ngesiNgesi kwaye ingeniswe kwinkomfa ye-SEIM-2017. Ndavuya kakhulu xa ndafumanisa ukuba wamkelwa emva kwayo yonke into, kwaye ndagqiba ekubeni ndihlolisise nzulu kwisihloko. Ingcamango ngokwayo ayintsha - yaqala ukusetyenziswa emva kweminyaka engama-90, kodwa ngoku isetyenziswa kwiindawo ezininzi.

Ngethuba lesemester yam yesibini kwiziko, ndaqalisa iprojekthi yophando yokuphucula i-algorithms yokufumana ukuxhomekeka okusebenzayo. Wasebenza kuyo kunye nomfundi ophumelele kwiYunivesithi yaseSt. Petersburg State uNikita Bobrov kwiJetBrains Research.

Ukuntsonkotha kobunzima bokukhangela ukuxhomekeka kokusebenza

Ingxaki ephambili kukuntsonkotha kokubala. Inani lokuxhomekeka okunokwenzeka okuncinci kunye nokungancinci lilinganiselwe ngasentla ngexabiso Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukachaphi Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha - inani leempawu zetheyibhile. Ixesha lokusebenza le-algorithms alixhomekanga kuphela kwinani leempawu, kodwa nakwinani lemiqolo. Ngeminyaka yoo-90s, i-algorithms yokukhangela umthetho we-Federal kwi-desktop yesiqhelo yePC inokuqhuba iiseti zedatha eziqulathe ukuya kuthi ga kwiimpawu ze-20 kunye namashumi amawaka emigca ukuya kwiiyure ezininzi. Ii-algorithms zangoku ezisebenza kwiiprosesa ezingundoqo ezininzi zifumanisa ukuxhomekeka kwiiseti zedatha eziquka amakhulu eempawu (ukuya kuma-200) kunye namakhulu amawaka emigca malunga nexesha elinye. Nangona kunjalo, oku akwanelanga: ixesha elinjalo alamkelekanga kwizicelo ezininzi zehlabathi lokwenyani. Ke ngoko, siye savelisa iindlela zokukhawulezisa ii-algorithms ezikhoyo.

Izikim ze-caching ze-partition intersections

Kwinxalenye yokuqala yomsebenzi, siphuhlise izikimu ze-caching zeklasi ye-algorithms esebenzisa indlela yokwahlulahlula. Isahlulo sophawu yiseti yoluhlu, apho uluhlu ngalunye luqulathe amanani elayini anamaxabiso afanayo ophawu olunikiweyo. Uluhlu ngalunye olunjalo lubizwa ngokuba liqela. Ii-algorithms ezininzi zanamhlanje zisebenzisa izahlulelo ukumisela ukuba ngaba ukuxhomekeka kubanjwe okanye akunjalo, oko kukuthi, babambelela kwi-lemma: Ukuxhomekeka. Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha ibanjwe ukuba Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha. Apha Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha isahlulo sichongiwe kwaye ingqikelelo yobungakanani besahlulelo isetyenzisiwe - inani lamaqela kulo. Ii-algorithms ezisebenzisa izahlulo, xa ukuxhomekeka kuphulwa, yongeza iimpawu ezongezelelweyo kwicala lasekhohlo lokuxhomekeka, uze uphinde uphinde ubale, wenze umsebenzi wokunqumla kwezahlulo. Lo msebenzi ubizwa ngokuba yincutshe kumanqaku. Kodwa siye saqaphela ukuba izahlulo zokuxhomekeka eziya kugcinwa kuphela emva kwemijikelo embalwa yobungcali inokuphinda isetyenziswe ngokusebenzayo, nto leyo enokunciphisa kakhulu ixesha lokuhamba kwe-algorithms, kuba umsebenzi wokuhlangana ubiza imali eninzi.

Ke ngoko, sicebise i-heuristic esekwe kwi-Shannon Entropy kunye ne-Ginny Ukungaqiniseki, kunye ne-metric yethu, esiyibiza ngokuba yi-Reverse Entropy. Luhlengahlengiso oluncinci lwe-Shannon Entropy kwaye luyanda njengoko ukohluka kweseti yedatha kunyuka. I-heuristic ecetywayo yile ilandelayo:

Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha

kuyinto Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha - iqondo lokungafani kwesahlulelo esisanda kubalwa Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha, kwaye Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha yi-median ye-degrees of uniqueness yeempawu ezizimeleyo. Zontathu iimetrics ezichazwe ngasentla zavavanywa njengemetric eyodwa. Unokuqaphela kwakhona ukuba kukho izilungisi ezimbini kwi-heuristic. Eyokuqala ibonisa ukuba ulwahlulo lwangoku lusondele kangakanani kwisitshixo esingundoqo kwaye ikuvumela ukuba ugcine kwindawo enkulu ezo zahlulelo zikude nesitshixo esinokwenzeka. Isilungisi sesibini sikuvumela ukuba ubeke iliso kwindawo yokuhlala kwaye ngaloo ndlela ikhuthaza ukongeza izahlulelo ezininzi kwindawo yokugcina indawo ukuba isithuba esisimahla siyafumaneka. Isisombululo esiyimpumelelo sale ngxaki sasivumela ukuba sikhawuleze i-algorithm ye-PYRO nge-10-40%, kuxhomekeke kwidathasethi. Kuyafaneleka ukuba uqaphele ukuba i-algorithm ye-PYRO yeyona nto iphumelele kakhulu kule ndawo.

Kulo mzobo ungezantsi unokubona iziphumo zokusebenzisa i-heuristic ecetywayo xa kuthelekiswa nendlela esisiseko yogcino lwemali ezinkozo. I-X i-axis yi-logarithmic.

Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha

Enye indlela yokugcina izahlulo

Emva koko sacebisa enye indlela yokugcina izahlulo. Izahlulo yiseti yamaqela, ngalinye ligcina amanani ee-tuples ezinamaxabiso afanayo kwiimpawu ezithile. La maqela angaqulatha ulandelelwano olude lwamanani e-tuple, umzekelo ukuba idatha kwitafile iyalelwa. Ke ngoko, sicebise iskimu sokucinezela ukugcina izahlulo, ezizezi, ukugcinwa kwexesha elithile kwamaxabiso kumaqela ezahlulo:

$$display$$pi(X) = {{underbrace{1, 2, 3, 4, 5}_{Ithuba lokuqala}, underbrace{7, 8}_{Ikhefu leSibini}, 10}}\ downarrow{ Uxinzelelo} \ pi(X) = {{ibrace engaphantsi{$, 1, 5}_{Ithuba lokuQala~}, i-underbrace{7, 8}_{Isithuba sesibini~}, 10}}$$display$$

Le ndlela yakwazi ukunciphisa ukusetyenziswa kwememori ngexesha lokusebenza kwe-algorithm ye-TANE ukusuka kwi-1 ukuya kwi-25%. I-algorithm ye-TANE yi-algorithm yakudala yokukhangela imithetho ye-federal isebenzisa izahlulo ngexesha lomsebenzi wayo. Njengenxalenye yesenzo, i-algorithm ye-TANE yakhethwa, kuba kwakulula kakhulu ukuphumeza ukugcinwa kwekhefu kuyo kunokuba, umzekelo, kwi-PYRO ukwenzela ukuvavanya ukuba indlela ecetywayo isebenza. Iziphumo ezifunyenweyo zinikwe kulo mzobo ungezantsi. I-X i-axis yi-logarithmic.

Fumana ngokufanelekileyo ukuxhomekeka okusebenzayo koovimba beenkcukacha

INkomfa ye-ADBIS-2019

Ngokusekwe kwiziphumo zophando, ngoSeptemba 2019 ndapapasha inqaku I-Caching ye-Smart yokuFumana okuSebenzayo kokuxhomekeka koMsebenzi kwiNkomfa yaseYurophu ye-23 kwiNkcazo kwiNkcazo kunye neeNkqubo zoLwazi (ADBIS-2019). Ngethuba lokubonisa, umsebenzi waqatshelwa nguBernhard Thalheim, umntu obalulekileyo kwindawo yogcino-lwazi. Iziphumo zophando zenze isiseko se-dissertation yam kwi-master degree kwimathematika kunye noomatshini kwiYunivesithi yaseSt. Petersburg State University, apho zombini iindlela ezicetywayo (i-caching kunye noxinzelelo) zaphunyezwa kuzo zombini i-algorithms: i-TANE kunye ne-PYRO. Ngaphezu koko, iziphumo zibonise ukuba iindlela ezicetywayo zendalo yonke, ekubeni kuzo zombini ii-algorithms, kunye neendlela zombini, ukunciphisa okubonakalayo kokusetyenziswa kwememori kwabonwa, kunye nokunciphisa kakhulu ixesha lokusebenza kwe-algorithms.

umthombo: www.habr.com

Yongeza izimvo