Thola kahle ukuncika kokusebenza kusizindalwazi

Ukuthola ukuncika okusebenzayo kudatha kusetshenziswa ezindaweni ezihlukahlukene zokuhlaziya idatha: ukuphathwa kwedathabhesi, ukuhlanzwa kwedatha, ubunjiniyela bokuhlehla kwedatha, nokuhlola idatha. Sesivele sishicilele mayelana nokuncika ngokwako. indatshana U-Anastasia Birillo noNikita Bobrov. Kulokhu, u-Anastasia, othweswe iziqu e-Computer Science Center kulo nyaka, wabelana ngokuthuthukiswa kwalo msebenzi wocwaningo, awuvikele kulesi sikhungo.

Thola kahle ukuncika kokusebenza kusizindalwazi

Ukukhetha umsebenzi

Ngenkathi ngifunda e-CS Center, ngaqala ukutadisha imininingwane yolwazi ngokujulile, ikakhulukazi, ngithola ukuncika okusebenzayo nokuhlukile. Lesi sihloko besihlobene nomsebenzi wami wezifundo zasenyuvesi, ngakho-ke ngenkathi ngisebenza kuso, ngaqala ukufunda izindatshana ezimayelana nokuncika okuhlukahlukene kusizindalwazi. Ngabhala isibuyekezo salo mkhakha—owokuqala wami izihloko ngesiNgisi futhi yawuthumela engqungqutheleni ye-SEIM-2017. Ngajabula lapho samukelwa futhi nganquma ukujula esihlokweni. Umqondo ngokwawo awumusha—ulokhu ukhona kusukela ngawo-90—kodwa usasebenza emikhakheni eminingi namuhla.

Kusemester yami yesibili esikhungweni, ngaqala iphrojekthi yocwaningo yokuthuthukisa ama-algorithms okuthola ukuncika kokusebenza. Ngasebenza ngakho noNikita Bobrov, umfundi oneziqu eSt. Petersburg State University, eJetBrains Research.

Ubunkimbinkimbi bekhompyutha bokufuna ukuncika kokusebenza

Inkinga enkulu ukuba yinkimbinkimbi kwekhompyutha. Inani lokuncika okungaba khona okuncane nokungeyona into encane linqunyelwe ukusuka phezulu ngokuthi Thola kahle ukuncika kokusebenza kusizindalwazikuphi Thola kahle ukuncika kokusebenza kusizindalwazi - inani lezimpawu zethebula. Isikhathi sokusebenza sama-algorithms asincikile kuphela enanini lezimfanelo kodwa futhi enanini lemigqa. Ngawo-90s, ama-algorithms okuthola umthetho wombuso ku-PC yedeskithophu evamile ayekwazi ukucubungula amasethi edatha aqukethe izibaluli ezifika kwezingu-20 kanye namashumi ezinkulungwane zemigqa kuze kufike emahoreni ambalwa. Ama-algorithms esimanje asebenza kuma-multi-core processors athola ukuncika kumadathasethi ahlanganisa amakhulukhulu ezichasiso (kufika ku-200) kanye namakhulu ezinkulungwane zemigqa cishe ngesikhathi esifanayo. Nokho, lokhu akwanele: isikhathi esinjalo asamukelekile ezinhlelweni eziningi zomhlaba wangempela. Ngakho-ke, senze izindlela zokusheshisa ama-algorithm akhona.

Izikimu zokulondoloza isikhashana zokuphambana komgwaqo

Engxenyeni yokuqala yephepha, sakha izikimu zokulondoloza isikhashana zekilasi lama-algorithms sisebenzisa indlela yokuhlukanisa umgwaqo. Ingxenye yesibaluli iyisethi yohlu, lapho uhlu ngalunye luqukethe izinombolo zomugqa ezinamavelu afanayo esibaluli esinikeziwe. Uhlu ngalunye olunjalo lubizwa ngokuthi iqoqo. Ama-algorithms amaningi esimanje asebenzisa ama-partitions ukuze anqume ukuthi ukuncika kuphethe, okungukuthi, anamathela kule lemma elandelayo: Ukuncika. Thola kahle ukuncika kokusebenza kusizindalwazi iyagcinwa uma Thola kahle ukuncika kokusebenza kusizindalwazi. Lapha Thola kahle ukuncika kokusebenza kusizindalwazi Ukwahlukanisa kuchazwa ngokuthi , futhi umqondo wosayizi wokuhlukanisa—inani lamaqoqo ngaphakathi kwawo—usetshenziswa. Ama-algorithms asebenzisa ama-partitions engeza izibaluli ezengeziwe ohlangothini lwesobunxele lokuncika lapho ukuncika kwephulwa, bese uphinda ubala ngokwenza umsebenzi wokuhlukanisa umgwaqo. Lo msebenzi kubhekiselwa kuwo kuma-athikili njengokukhethekile. Kodwa-ke, siqaphele ukuthi ukuhlukaniswa kokuncika okuzogcinwa kuphela ngemva kwemizuliswano eminingana yobungcweti kungasetshenziswa kabusha, okunganciphisa kakhulu isikhathi sokusebenza kwama-algorithms, njengoba ukusebenza kwe-intersection kuyabiza.

Ngakho-ke, siphakamise i-heuristic esekelwe ku-Shannon entropy kanye nokungaqiniseki kwe-Gini, kanye ne-metric yethu, esiyibiza ngokuthi i-Inverse Entropy. Kungukuguqulwa okuncane kwe-Shannon entropy futhi kuyanda njengoba ukuhluka kwedathasethi kukhula. I-heuristic ehlongozwayo imi kanje:

Thola kahle ukuncika kokusebenza kusizindalwazi

kuyinto Thola kahle ukuncika kokusebenza kusizindalwazi - izinga lokuhluka kwengxenye esanda kubalwa Thola kahle ukuncika kokusebenza kusizindalwazi, futhi Thola kahle ukuncika kokusebenza kusizindalwazi iyimidiyeni yamadigri okuhlukile ezicini ezingazodwana. Womathathu amamethrikhi achazwe ngenhla ahlolwe njengamamethrikhi okuhlukile. Kungaphawulwa futhi ukuthi i-heuristic ihlanganisa ama-modifiers amabili. Eyokuqala ibonisa ukuthi ukuhlukaniswa kwamanje kusondele kangakanani kukhiye oyinhloko futhi ivumela ukugcinwa kwesikhashana okukhulu kwama-partitions kude nokhiye wekhandidethi. Isilungisi sesibili siqapha ukuhlala kwenqolobane, ngaleyo ndlela sikhuthaze ukwengezwa kwama-partitions amaningi kunqolobane lapho isikhala sitholakala. Ukuxazulula ngempumelelo le nkinga kuvumele i-algorithm ye-PYRO ukuthi isheshise ngo-10-40% kuye ngedathasethi. Kuyaqapheleka ukuthi i-algorithm ye-PYRO iphumelele kakhulu kule ndawo.

Isibalo esingezansi sibonisa imiphumela yokusebenzisa i-heuristic ehlongozwayo uma kuqhathaniswa nendlela yokulondoloza isikhashana eyisisekelo esekelwe ekuphendulweni kwezinhlamvu zemali. I-x-axis i-logarithmic.

Thola kahle ukuncika kokusebenza kusizindalwazi

Enye indlela yokugcina ama-partitions

Sibe sesiphakamisa enye indlela yokugcina ama-partitions. Ama-partitions ayiqoqo lamaqoqo, ngalinye eligcina izinombolo ze-tuple ezinamavelu afanayo ezimfanelo ezithile. Lawa maqoqo angaqukatha ukulandelana okude kwezinombolo ze-tuple, isibonelo, uma idatha esethebula i-odwa. Ngakho-ke, siphakamise uhlelo lokucindezela lokugcina ama-partitions, okungukuthi, ukugcinwa kwesikhashana kwamanani kumaqoqo okuhlukanisa:

$$display$$pi(X) = {{underbrace{1, 2, 3, 4, 5}_{First~interval}, underbrace{7, 8}_{Second~interval}, 10}}\ downarrow{Compression}\ pi(X) = {{underbrace{$, 1, underbrace{7},{5} 8}_{Isikhathi sesibili~}, 10}}$$display$$

Le ndlela ikwazile ukunciphisa ukusetshenziswa kwenkumbulo ngesikhathi sokwenziwa kwe-algorithm ye-TANE ngo-1 kuya ku-25%. I-algorithm ye-TANE iyi-algorithm yakudala yokusesha izindawo ezinengqondo; isebenzisa ama-partitions ngesikhathi sokusebenza kwayo. Ngezinjongo ezingokoqobo, i-algorithm ye-TANE yakhethwa ngoba ukusebenzisa ukugcinwa kwezikhawu kwakulula kakhulu, ngokwesibonelo, ku-PYRO ukuhlola ukuba nokwenzeka kwendlela ehlongozwayo. Imiphumela yethulwa esithombeni esingezansi. I-x-axis i-logarithmic.

Thola kahle ukuncika kokusebenza kusizindalwazi

Inkomfa ADBIS-2019

Ngokusekelwe emiphumeleni yocwaningo, ngishicilele indatshana ngoSepthemba 2019. I-Smart Caching for Functional Dependency Discovery Ku-23rd European Conference on Advances in Databases and Information Systems (ADBIS-2019), umsebenzi waqashelwa uBernhard Thalheim, umuntu ovelele emkhakheni wolwazi. Imiphumela yocwaningo yaba isisekelo socwaningo lwami lohlelo lwe-Master ku-Faculty of Mathematics and Mechanics e-St. Petersburg State University, lapho zombili izindlela ezihlongozwayo (ukugcinwa kwesikhashana kanye nokucindezelwa) zasetshenziswa kuzo zombili izindlela zokusebenzisa amandla: i-TANE ne-PYRO. Imiphumela yabonisa ukuthi izindlela ezihlongozwayo zisebenza emhlabeni wonke, njengoba womabili ama-algorithms abonise ukuncipha okukhulu kokusetshenziswa kwenkumbulo kanye nokuncipha okukhulu kwesikhathi sokwenza.

Source: www.habr.com

Engeza amazwana