Thola kahle ukuncika kokusebenza kusizindalwazi

Ukuthola ukuncika okusebenzayo kudatha kusetshenziswa ezindaweni ezihlukahlukene zokuhlaziywa kwedatha: ukuphathwa kwedathabhesi, ukuhlanzwa kwedatha, ubunjiniyela bokuhlehla kwedatha kanye nokuhlola idatha. Sesivele sishicilele mayelana nabancikile ngokwabo indatshana U-Anastasia Birillo noNikita Bobrov. Kulokhu, u-Anastasia, othweswe iziqu e-Computer Science Center kulo nyaka, wabelana ngokuthuthukiswa kwalo msebenzi njengengxenye yomsebenzi wocwaningo awuvikele kulesi sikhungo.

Thola kahle ukuncika kokusebenza kusizindalwazi

Ukukhetha umsebenzi

Ngenkathi ngifunda esikhungweni se-CS, ngaqala ukutadisha imininingwane egciniwe ngokujulile, okungukuthi, ukucinga ukusebenza nokuncika komehluko. Lesi sihloko besihlobene nesihloko somsebenzi wami wezifundo enyuvesi, ngakho-ke ngenkathi ngisebenza esifundweni, ngaqala ukufunda izindatshana ezimayelana nokuncika okuhlukahlukene kusizindalwazi. Ngibhale isibuyekezo sale ndawo - enye yezindawo zami zokuqala izihloko ngesiNgisi futhi yawuthumela engqungqutheleni ye-SEIM-2017. Ngajabula kakhulu lapho ngithola ukuthi wamukelwa phela, ngase nginquma ukujula esihlokweni. Umqondo ngokwawo awumusha - waqala ukusetshenziswa emuva kuma-90s, kodwa ngisho namanje usetshenziswa ezindaweni eziningi.

Phakathi nesemester yami yesibili esikhungweni, ngaqala iphrojekthi yocwaningo yokuthuthukisa ama-algorithms okuthola ukuncika kokusebenza. Wasebenza kuyo kanye nesitshudeni sase-St. Petersburg State University u-Nikita Bobrov e-JetBrains Research.

Ubunkimbinkimbi bekhompyutha bokufuna ukuncika kokusebenza

Inkinga enkulu ukuba yinkimbinkimbi kwekhompyutha. Inombolo yokuncika okungenzeka okuncane nokungeyona into encane ikhawulelwe ngenhla ngevelu Thola kahle ukuncika kokusebenza kusizindalwazikuphi Thola kahle ukuncika kokusebenza kusizindalwazi - inani lezimfanelo zethebula. Isikhathi sokusebenza se-algorithms asixhomekile kuphela enanini lezimfanelo, kodwa futhi nenani lemigqa. Ngawo-90s, ama-algorithms okusesha umthetho wenhlangano ku-PC yedeskithophu evamile ayekwazi ukucubungula amasethi edatha aqukethe izibaluli ezifika kwezingu-20 kanye namashumi ezinkulungwane zemigqa emahoreni ambalwa. Ama-algorithms esimanje asebenza kuma-multi-core processors athola ukuncika kumasethi wedatha ahlanganisa amakhulukhulu ezichasiso (kufika ku-200) kanye namakhulu ezinkulungwane zemigqa cishe ngesikhathi esifanayo. Nokho, lokhu akwanele: isikhathi esinjalo asamukelekile ezinhlelweni eziningi zomhlaba wangempela. Ngakho-ke, senze izindlela zokusheshisa ama-algorithm akhona.

Izikimu zokulondoloza isikhashana ezimpambanweni zomgwaqo zokuhlukanisa

Engxenyeni yokuqala yomsebenzi, sithuthukise izikimu zokulondoloza isikhashana zekilasi lama-algorithms asebenzisa indlela yokuhlukanisa ukuhlukanisa. Ingxenye yesibaluli iyisethi yohlu, lapho uhlu ngalunye luqukethe izinombolo zomugqa ezinamavelu afanayo esibaluli esinikeziwe. Uhlu ngalunye olunjalo lubizwa ngokuthi iqoqo. Ama-algorithms amaningi esimanje asebenzisa ama-partitions ukuze anqume ukuthi ukuncika kuphethwe noma cha, okungukuthi, anamathela ku-lemma: Ukuncika. Thola kahle ukuncika kokusebenza kusizindalwazi ibanjwe uma Thola kahle ukuncika kokusebenza kusizindalwazi. Lapha Thola kahle ukuncika kokusebenza kusizindalwazi i-partition ikhethiwe futhi umqondo wosayizi wokuhlukanisa usetshenziswa - inani lamaqoqo kuwo. Ama-algorithms asebenzisa ama-partitions, lapho ukuncika kwephulwa, engeza izibaluli ezengeziwe ohlangothini lwesokunxele lokuncika, bese ubala kabusha, wenza umsebenzi wokuphambana kwezingxenye. Lokhu kusebenza kubizwa ngokuthi yi-specialization kuma-athikili. Kodwa siqaphele ukuthi ukwahlukanisa kokuncika okuzogcinwa kuphela ngemva kwemizuliswano embalwa yobungcweti kungasetshenziswa kabusha, okunganciphisa kakhulu isikhathi sokusebenza se-algorithms, njengoba ukusebenza kwe-intersection kuyabiza.

Ngakho-ke, siphakamise i-heuristic esekelwe ku-Shannon Entropy kanye ne-Ginny Uncertainty, kanye ne-metric yethu, esiyibiza ngokuthi i-Reverse Entropy. Kungukuguqulwa okuncane kwe-Shannon Entropy futhi kuyanda njengoba ukuhluka kwesethi yedatha kukhula. I-heuristic ehlongozwayo imi kanje:

Thola kahle ukuncika kokusebenza kusizindalwazi

kuyinto Thola kahle ukuncika kokusebenza kusizindalwazi - izinga lokuhluka lokuhlukanisa okusanda kubalwa Thola kahle ukuncika kokusebenza kusizindalwazi, futhi Thola kahle ukuncika kokusebenza kusizindalwazi iyimidiyeni yamadigri okuhlukile ezicini ezingazodwana. Womathathu amamethrikhi achazwe ngenhla ahlolwe njengemethrikhi ehlukile. Ungaqaphela futhi ukuthi kukhona ama-modifiers amabili ku-heuristic. Eyokuqala ikhombisa ukuthi ukuhlukaniswa kwamanje kusondele kangakanani kukhiye oyinhloko futhi ikuvumela ukuthi ugcine inqolobane ngokwezinga elikhulu lezo zihlukanisi ezikude nokhiye ongaba khona. Isilungisi sesibili sikuvumela ukuthi uqaphe ukuhlala kwenqolobane futhi ngaleyo ndlela sikhuthaze ukungeza izingxenye ezingaphezulu kunqolobane uma isikhala samahhala sitholakala. Isixazululo esiyimpumelelo sale nkinga sisivumele ukuthi sisheshise i-algorithm ye-PYRO ngo-10-40%, kuye ngedathasethi. Kuyaqapheleka ukuthi i-algorithm ye-PYRO iphumelele kakhulu kule ndawo.

Emfanekisweni ongezansi ungabona imiphumela yokusebenzisa i-heuristic ehlongozwayo uma kuqhathaniswa nendlela eyisisekelo yokulondoloza uhlamvu lwemali. I-X axis i-logarithmic.

Thola kahle ukuncika kokusebenza kusizindalwazi

Enye indlela yokugcina ama-partitions

Sibe sesiphakamisa enye indlela yokugcina ama-partitions. Ama-partitions ayiqoqo lamaqoqo, ngalinye eligcina izinombolo zama-tuples anamanani afanayo ezicini ezithile. Lawa maqoqo angase aqukathe ukulandelana okude kwezinombolo ze-tuple, isibonelo uma idatha esethebula i-odiwe. Ngakho-ke, siphakamise uhlelo lokucindezela lokugcina ama-partitions, okungukuthi ukugcinwa kwesikhashana kwamanani kumaqoqo we-partitions:

$$display$$pi(X) = {{i-underbrace{1, 2, 3, 4, 5}_{Isikhawu sokuqala}, i-underbrace{7, 8}_{Isikhawu sesibili}, 10}}\ downarrow{ Compression} \ pi(X) = {{i-underbrace{$, 1, 5}_{First~interval}, i-underbrace{7, 8}_{Second~interval}, 10}}$$display$$

Le ndlela ikwazile ukunciphisa ukusetshenziswa kwenkumbulo ngesikhathi sokusebenza kwe-algorithm ye-TANE kusuka ku-1 kuya ku-25%. I-algorithm ye-TANE iyi-algorithm yakudala yokusesha imithetho yenhlangano; isebenzisa ukwahlukanisa phakathi nomsebenzi wayo. Njengengxenye yomkhuba, i-algorithm ye-TANE yakhethwa, njengoba kwakulula kakhulu ukusebenzisa ukugcinwa kwesikhashana kuyo, ngokwesibonelo, ku-PYRO ukuze kuhlolwe ukuthi indlela ehlongozwayo iyasebenza yini. Imiphumela etholiwe yethulwa esithombeni esingezansi. I-X axis i-logarithmic.

Thola kahle ukuncika kokusebenza kusizindalwazi

Inkomfa ADBIS-2019

Ngokusekelwe emiphumeleni yocwaningo, ngoSepthemba 2019 ngashicilela indatshana I-Smart Caching for Functional Dependency Discovery engqungqutheleni yama-23 ye-European Conference on Advances in Databases and Information Systems (ADBIS-2019). Phakathi nesethulo, umsebenzi waqashelwa uBernhard Thalheim, umuntu obalulekile emkhakheni wemininingwane. Imiphumela yocwaningo yaba isisekelo socwaningo lwami ezifundweni ze-masters zezibalo nezomshini e-St. Petersburg State University, lapho zombili izindlela ezihlongozwayo (ukugcinwa kwesikhashana kanye nokuminyanisa) zasetshenziswa kuwo womabili ama-algorithms: i-TANE ne-PYRO. Ngaphezu kwalokho, imiphumela ibonise ukuthi izindlela ezihlongozwayo zisebenza emhlabeni wonke, njengoba kuzo zombili izindlela zokulungisa, kuzo zombili izindlela, ukuncipha okuphawulekayo kokusetshenziswa kwenkumbulo kwabonwa, kanye nokuncipha okuphawulekayo kwesikhathi sokusebenza se-algorithms.

Source: www.habr.com

Engeza amazwana