Pezani zodalira zomwe zimagwira ntchito bwino mu database

Kupeza kudalira kwa data kumagwiritsidwa ntchito m'magawo osiyanasiyana osanthula deta: kasamalidwe ka database, kuyeretsa deta, uinjiniya wosinthika wa database ndi kufufuza deta. Tasindikiza kale za odalira okha nkhani Anastasia Birillo ndi Nikita Bobrov. Panthawiyi, Anastasia, womaliza maphunziro a Computer Science Center chaka chino, akugawana chitukuko cha ntchitoyi monga gawo la ntchito yofufuza yomwe adayiteteza pakatikati.

Pezani zodalira zomwe zimagwira ntchito bwino mu database

Kusankha ntchito

Ndikuphunzira ku CS Center, ndinayamba kuphunzira mozama za nkhokwe, kusakasaka magwiridwe antchito komanso kusiyana kodalira. Mutuwu unali wokhudzana ndi mutu wa maphunziro anga ku yunivesite, kotero pamene ndikugwira ntchito pa maphunzirowa, ndinayamba kuwerenga nkhani zokhudzana ndi kudalira kosiyanasiyana m'ma database. Ndinalemba ndemanga ya dera ili - imodzi mwazoyamba zanga zolemba m'Chingerezi ndikuzipereka ku msonkhano wa SEIM-2017. Ndinasangalala kwambiri nditazindikira kuti adalandiridwa pambuyo pake, ndipo ndinaganiza zofufuza mozama pamutuwu. Lingaliro palokha si latsopano - linayamba kugwiritsidwa ntchito m'ma 90, koma ngakhale tsopano likugwiritsidwa ntchito m'madera ambiri.

Pa semester yanga yachiwiri pakatikati, ndidayamba ntchito yofufuza kuti ndisinthe ma aligorivimu kuti ndipeze zomwe zimadalira. Anagwira ntchito limodzi ndi wophunzira maphunziro a yunivesite ya St. Petersburg State Nikita Bobrov ku JetBrains Research.

Kuchulukirachulukira pakufufuza zodalira zantchito

Vuto lalikulu ndi computational zovuta. Chiwerengero cha kudalira kotheka kochepa komanso kosawerengeka ndi kochepa pamwamba pa mtengo Pezani zodalira zomwe zimagwira ntchito bwino mu databasekumene Pezani zodalira zomwe zimagwira ntchito bwino mu database - chiwerengero cha mawonekedwe a tebulo. Nthawi yogwiritsira ntchito ma algorithms sizitengera kuchuluka kwa zikhumbo, komanso kuchuluka kwa mizere. M'zaka za m'ma 90s, ma aligorivimu aku federal pakompyuta yanthawi zonse amatha kukonza ma data okhala ndi ma 20 ndi mizere masauzande mpaka maola angapo. Ma aligorivimu amakono omwe akuyenda pa mapurosesa amitundu yambiri amazindikira kudalira kwa seti ya data yokhala ndi mazana mazana (mpaka 200) ndi mizere mazana masauzande pafupifupi nthawi imodzi. Komabe, izi sizokwanira: nthawi yotereyi ndiyosavomerezeka pazogwiritsa ntchito zenizeni zenizeni. Chifukwa chake, tidapanga njira zofulumizitsa ma algorithm omwe alipo.

Ma caching schemics for partition intersections

M'gawo loyamba la ntchitoyi, tidapanga ma caching a kalasi ya ma algorithms omwe amagwiritsa ntchito njira yodutsana. Kugawa kwa chikhalidwe ndi mndandanda wa mindandanda, pomwe mndandanda uliwonse uli ndi manambala amizere omwe ali ndi zikhalidwe zomwezo pamalingaliro operekedwa. Mndandanda uliwonse wotere umatchedwa masango. Ma algorithms ambiri amakono amagwiritsa ntchito magawo kuti adziwe ngati kudalira kumachitidwa kapena ayi, ndiko kuti, amatsatira lemma: Dependency. Pezani zodalira zomwe zimagwira ntchito bwino mu database gwira ngati Pezani zodalira zomwe zimagwira ntchito bwino mu database. Pano Pezani zodalira zomwe zimagwira ntchito bwino mu database kugawa kumasankhidwa ndipo lingaliro la kukula kwa magawo limagwiritsidwa ntchito - chiwerengero cha masango mmenemo. Ma aligorivimu omwe amagwiritsa ntchito magawo, pamene kudalira kumaphwanyidwa, onjezerani zina zowonjezera kumanzere kwa kudalira, ndiyeno muwerengenso, ndikuchita ntchito ya mphambano ya magawo. Opaleshoni imeneyi imatchedwa ukatswiri m'nkhani. Koma tidawona kuti magawo azinthu zomwe zimangosungidwa pambuyo pazambiri zingapo zaukadaulo zitha kugwiritsidwanso ntchito mwachangu, zomwe zitha kuchepetsa kwambiri nthawi yoyendetsera ma aligorivimu, popeza ntchito ya mphambano ndi yokwera mtengo.

Chifukwa chake, tidapereka lingaliro la heuristic yotengera Shannon Entropy ndi Ginny Kusatsimikizika, komanso metric yathu, yomwe tidayitcha Reverse Entropy. Ndikusintha pang'ono kwa Shannon Entropy ndipo kumawonjezeka pamene kusiyanasiyana kwa seti kumawonjezeka. Cholinga cha heuristic ndi motere:

Pezani zodalira zomwe zimagwira ntchito bwino mu database

ndi Pezani zodalira zomwe zimagwira ntchito bwino mu database - kuchuluka kwapadera kwa magawo omwe awerengeredwa posachedwa Pezani zodalira zomwe zimagwira ntchito bwino mu databasendi Pezani zodalira zomwe zimagwira ntchito bwino mu database ndiye wapakati wa madigiri apadera a mikhalidwe yamunthu payekha. Ma metric onse atatu omwe afotokozedwa pamwambapa adayesedwa ngati ma metric apadera. Mutha kuzindikiranso kuti pali zosintha ziwiri mu heuristic. Yoyamba ikuwonetsa momwe gawo lapano liliri pafupi ndi kiyi yoyamba ndikukulolani kuti musungitse magawo omwe ali kutali ndi kiyi yomwe ingatheke. Chosinthira chachiwiri chimakupatsani mwayi wowunika kuchuluka kwa cache ndipo potero amalimbikitsa kuwonjezera magawo ambiri pa cache ngati malo aulere alipo. Njira yothetsera vutoli idatilola kufulumizitsa algorithm ya PYRO ndi 10-40%, kutengera deta. Ndizofunikira kudziwa kuti algorithm ya PYRO ndiye yopambana kwambiri m'derali.

Pachithunzi chomwe chili m'munsimu mutha kuwona zotsatira za kugwiritsa ntchito njira yofananira ndi njira yosungira ndalama. X axis ndi logarithmic.

Pezani zodalira zomwe zimagwira ntchito bwino mu database

Njira ina yosungira magawo

Kenako tinapanga njira ina yosungira magawo. Ma partitions ndi magulu amagulu, omwe amasunga manambala a ma tuple okhala ndi mikhalidwe yofananira pamikhalidwe ina. Maguluwa atha kukhala ndi manambala aatali otsatizana, mwachitsanzo ngati data yomwe ili patebulo yayitanitsa. Chifukwa chake, tidakonza chiwembu chophatikizira chosungira magawo, omwe ndi kusungidwa kwanthawi yayitali m'magulu a magawo:

$$display$$pi(X) = {{underbrace{1, 2, 3, 4, 5}_{First interval}, underbrace{7, 8}_{Second interval}, 10}}\ downarrow{ Compression} \ pi(X) = {{underbrace{$, 1, 5}_{First~interval}, underbrace{7, 8}_{Second~interval}, 10}}$$display$$

Njirayi inatha kuchepetsa kukumbukira kukumbukira pakugwira ntchito kwa TANE algorithm kuchokera ku 1 mpaka 25%. TANE algorithm ndi njira yachikale posaka malamulo aboma; imagwiritsa ntchito magawo panthawi yantchito yake. Monga gawo lachizoloΕ΅ezi, algorithm ya TANE inasankhidwa, popeza inali yosavuta kugwiritsa ntchito kusungirako kwapakati momwemo kusiyana ndi, mwachitsanzo, mu PYRO kuti muwone ngati njira yomwe ikuperekedwa ikugwira ntchito. Zotsatira zomwe zapezedwa zikufotokozedwa mu chithunzi pansipa. X axis ndi logarithmic.

Pezani zodalira zomwe zimagwira ntchito bwino mu database

Msonkhano wa ADBIS-2019

Kutengera zotsatira za kafukufukuyu, mu Seputembala 2019 ndidasindikiza nkhani Smart Caching for Mogwira Ntchito Dependency Discovery ku 23rd European Conference on Advances in Databases and Information Systems (ADBIS-2019). Pachiwonetserochi, ntchitoyi idadziwika ndi Bernhard Thalheim, munthu wofunika kwambiri pazankho. Zotsatira zafukufuku zinapanga maziko a zolemba zanga pa digiri ya master mu masamu ndi makaniko ku St. Petersburg State University, pomwe njira zonse zomwe zaperekedwa (caching ndi compression) zinakhazikitsidwa muzolemba zonse ziwiri: TANE ndi PYRO. Kuphatikiza apo, zotsatira zake zidawonetsa kuti njira zomwe zikuperekedwazo ndi zapadziko lonse lapansi, popeza panjira zonse ziwiri, ndi njira zonse ziwiri, kuchepa kwakukulu kwa kukumbukira kunawonedwa, komanso kuchepa kwakukulu kwa nthawi yogwiritsira ntchito ma algorithms.

Source: www.habr.com

Kuwonjezera ndemanga