Ukufumana ukuxhomekeka okusebenzayo kwidatha kusetyenziswa kwiindawo ezahlukeneyo zokuhlalutya idatha: ulawulo lwedatha, ukucocwa kwedatha, ubunjineli be-database reverse kunye nokuhlola idatha. Sele sipapashe malunga nabaxhomekeke ngokwabo UAnastasia Birillo kunye noNikita Bobrov. Ngeli xesha, u-Anastasia, ophumelele kwiZiko leNzululwazi yeKhompyutha kulo nyaka, wabelana ngophuhliso lwalo msebenzi njengenxalenye yomsebenzi wophando owawukhusela kwiziko.

Ukukhetha umsebenzi
Ngelixa ndifunda kwiziko le-CS, ndaqala ukufunda ngoovimba beenkcukacha nzulu, oko kukuthi, ukukhangela ukuxhomekeka kokusebenza kunye nokwahluka. Esi sihloko sasinxulumene nesihloko somsebenzi wam eyunivesithi, ngoko ngelixa ndisebenza kwikhosi, ndaqala ukufunda amanqaku amalunga nokuxhomekeka okuhlukeneyo kwiinkcukacha. Ndibhale uphononongo lwale ndawo-enye yam yokuqala ngesiNgesi kwaye ingeniswe kwinkomfa ye-SEIM-2017. Ndavuya kakhulu xa ndafumanisa ukuba wamkelwa emva kwayo yonke into, kwaye ndagqiba ekubeni ndihlolisise nzulu kwisihloko. Ingcamango ngokwayo ayintsha - yaqala ukusetyenziswa emva kweminyaka engama-90, kodwa ngoku isetyenziswa kwiindawo ezininzi.
Ngethuba lesemester yam yesibini kwiziko, ndaqalisa iprojekthi yophando yokuphucula i-algorithms yokufumana ukuxhomekeka okusebenzayo. Wasebenza kuyo kunye nomfundi ophumelele kwiYunivesithi yaseSt. Petersburg State uNikita Bobrov kwiJetBrains Research.
Ukuntsonkotha kobunzima bokukhangela ukuxhomekeka kokusebenza
Ingxaki ephambili kukuntsonkotha kokubala. Inani lokuxhomekeka okunokwenzeka okuncinci kunye nokungancinci lilinganiselwe ngasentla ngexabiso
phi
- inani leempawu zetheyibhile. Ixesha lokusebenza le-algorithms alixhomekanga kuphela kwinani leempawu, kodwa nakwinani lemiqolo. Ngeminyaka yoo-90s, i-algorithms yokukhangela umthetho we-Federal kwi-desktop yesiqhelo yePC inokuqhuba iiseti zedatha eziqulathe ukuya kuthi ga kwiimpawu ze-20 kunye namashumi amawaka emigca ukuya kwiiyure ezininzi. Ii-algorithms zangoku ezisebenza kwiiprosesa ezingundoqo ezininzi zifumanisa ukuxhomekeka kwiiseti zedatha eziquka amakhulu eempawu (ukuya kuma-200) kunye namakhulu amawaka emigca malunga nexesha elinye. Nangona kunjalo, oku akwanelanga: ixesha elinjalo alamkelekanga kwizicelo ezininzi zehlabathi lokwenyani. Ke ngoko, siye savelisa iindlela zokukhawulezisa ii-algorithms ezikhoyo.
Izikim ze-caching ze-partition intersections
Kwinxalenye yokuqala yomsebenzi, siphuhlise izikimu ze-caching zeklasi ye-algorithms esebenzisa indlela yokwahlulahlula. Isahlulo sophawu yiseti yoluhlu, apho uluhlu ngalunye luqulathe amanani elayini anamaxabiso afanayo ophawu olunikiweyo. Uluhlu ngalunye olunjalo lubizwa ngokuba liqela. Ii-algorithms ezininzi zanamhlanje zisebenzisa izahlulelo ukumisela ukuba ngaba ukuxhomekeka kubanjwe okanye akunjalo, oko kukuthi, babambelela kwi-lemma: Ukuxhomekeka.
ibanjwe ukuba
. Apha
isahlulo sichongiwe kwaye ingqikelelo yobungakanani besahlulelo isetyenzisiwe - inani lamaqela kulo. Ii-algorithms ezisebenzisa izahlulo, xa ukuxhomekeka kuphulwa, yongeza iimpawu ezongezelelweyo kwicala lasekhohlo lokuxhomekeka, uze uphinde uphinde ubale, wenze umsebenzi wokunqumla kwezahlulo. Lo msebenzi ubizwa ngokuba yincutshe kumanqaku. Kodwa siye saqaphela ukuba izahlulo zokuxhomekeka eziya kugcinwa kuphela emva kwemijikelo embalwa yobungcali inokuphinda isetyenziswe ngokusebenzayo, nto leyo enokunciphisa kakhulu ixesha lokuhamba kwe-algorithms, kuba umsebenzi wokuhlangana ubiza imali eninzi.
Ke ngoko, sicebise i-heuristic esekwe kwi-Shannon Entropy kunye ne-Ginny Ukungaqiniseki, kunye ne-metric yethu, esiyibiza ngokuba yi-Reverse Entropy. Luhlengahlengiso oluncinci lwe-Shannon Entropy kwaye luyanda njengoko ukohluka kweseti yedatha kunyuka. I-heuristic ecetywayo yile ilandelayo:

kuyinto
- iqondo lokungafani kwesahlulelo esisanda kubalwa
, kwaye
yi-median ye-degrees of uniqueness yeempawu ezizimeleyo. Zontathu iimetrics ezichazwe ngasentla zavavanywa njengemetric eyodwa. Unokuqaphela kwakhona ukuba kukho izilungisi ezimbini kwi-heuristic. Eyokuqala ibonisa ukuba ulwahlulo lwangoku lusondele kangakanani kwisitshixo esingundoqo kwaye ikuvumela ukuba ugcine kwindawo enkulu ezo zahlulelo zikude nesitshixo esinokwenzeka. Isilungisi sesibini sikuvumela ukuba ubeke iliso kwindawo yokuhlala kwaye ngaloo ndlela ikhuthaza ukongeza izahlulelo ezininzi kwindawo yokugcina indawo ukuba isithuba esisimahla siyafumaneka. Isisombululo esiyimpumelelo sale ngxaki sasivumela ukuba sikhawuleze i-algorithm ye-PYRO nge-10-40%, kuxhomekeke kwidathasethi. Kuyafaneleka ukuba uqaphele ukuba i-algorithm ye-PYRO yeyona nto iphumelele kakhulu kule ndawo.
Kulo mzobo ungezantsi unokubona iziphumo zokusebenzisa i-heuristic ecetywayo xa kuthelekiswa nendlela esisiseko yogcino lwemali ezinkozo. I-X i-axis yi-logarithmic.

Enye indlela yokugcina izahlulo
Emva koko sacebisa enye indlela yokugcina izahlulo. Izahlulo yiseti yamaqela, ngalinye ligcina amanani ee-tuples ezinamaxabiso afanayo kwiimpawu ezithile. La maqela angaqulatha ulandelelwano olude lwamanani e-tuple, umzekelo ukuba idatha kwitafile iyalelwa. Ke ngoko, sicebise iskimu sokucinezela ukugcina izahlulo, ezizezi, ukugcinwa kwexesha elithile kwamaxabiso kumaqela ezahlulo:
$$display$$pi(X) = {{underbrace{1, 2, 3, 4, 5}_{Ithuba lokuqala}, underbrace{7, 8}_{Ikhefu leSibini}, 10}}\ downarrow{ Uxinzelelo} \ pi(X) = {{ibrace engaphantsi{$, 1, 5}_{Ithuba lokuQala~}, i-underbrace{7, 8}_{Isithuba sesibini~}, 10}}$$display$$
Le ndlela yakwazi ukunciphisa ukusetyenziswa kwememori ngexesha lokusebenza kwe-algorithm ye-TANE ukusuka kwi-1 ukuya kwi-25%. I-algorithm ye-TANE yi-algorithm yakudala yokukhangela imithetho ye-federal isebenzisa izahlulo ngexesha lomsebenzi wayo. Njengenxalenye yesenzo, i-algorithm ye-TANE yakhethwa, kuba kwakulula kakhulu ukuphumeza ukugcinwa kwekhefu kuyo kunokuba, umzekelo, kwi-PYRO ukwenzela ukuvavanya ukuba indlela ecetywayo isebenza. Iziphumo ezifunyenweyo zinikwe kulo mzobo ungezantsi. I-X i-axis yi-logarithmic.

INkomfa ye-ADBIS-2019
Ngokusekwe kwiziphumo zophando, ngoSeptemba 2019 ndapapasha inqaku kwiNkomfa yaseYurophu ye-23 kwiNkcazo kwiNkcazo kunye neeNkqubo zoLwazi (ADBIS-2019). Ngethuba lokubonisa, umsebenzi waqatshelwa nguBernhard Thalheim, umntu obalulekileyo kwindawo yogcino-lwazi. Iziphumo zophando zenze isiseko se-dissertation yam kwi-master degree kwimathematika kunye noomatshini kwiYunivesithi yaseSt. Petersburg State University, apho zombini iindlela ezicetywayo (i-caching kunye noxinzelelo) zaphunyezwa kuzo zombini i-algorithms: i-TANE kunye ne-PYRO. Ngaphezu koko, iziphumo zibonise ukuba iindlela ezicetywayo zendalo yonke, ekubeni kuzo zombini ii-algorithms, kunye neendlela zombini, ukunciphisa okubonakalayo kokusetyenziswa kwememori kwabonwa, kunye nokunciphisa kakhulu ixesha lokusebenza kwe-algorithms.
umthombo: www.habr.com
