Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase

Kutsvaga kutsamira kwekushanda mu data kunoshandiswa munzvimbo dzakasiyana dzekuongorora data: manejimendi dhatabhesi, kuchenesa data, database reverse engineering uye kuongorora data. Isu takatoburitsa nezve dependencies ivo pachavo chinyorwa Anastasia Birillo naNikita Bobrov. Panguva ino, Anastasia, akapedza kudzidza kuComputer Science Center gore rino, anogovera kuvandudzwa kwebasa iri sechikamu chebasa rekutsvaga raakadzivirira pakati.

Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase

Basa rekusarudza

Ndichiri kudzidza paCS centre, ndakatanga kudzidza dhatabhesi zvakadzama, zvinoti, kutsvaga kwekushanda uye kutsamira kwakasiyana. Ichi chinyorwa chaive chine chekuita nemusoro wekosi yangu kuyunivhesiti, saka ndichishanda pane kosi, ndakatanga kuverenga zvinyorwa pamusoro pezvakasiyana zvinotsamira mumadhatabhesi. Ndakanyora ongororo yenzvimbo iyi - imwe yekutanga yangu nyaya muChirungu uye akazvitumira kumusangano weSEIM-2017. Ndakafara zvikuru pandakaona kuti akagamuchirwa mushure mezvose, uye ndakasarudza kuzama zvakadzama munyaya yacho. Iyo pfungwa pachayo haisi itsva - yakatanga kushandiswa kumashure kwe90s, asi kunyange ikozvino inoshandiswa munzvimbo dzakawanda.

Mukati memesita yangu yechipiri pakati, ndakatanga chirongwa chekutsvagisa chekuvandudza maalgorithms ekutsvaga mashandiro anoshanda. Akashanda pairi pamwe chete nemudzidzi weSt. Petersburg State University Nikita Bobrov paJetBrains Research.

Computational kuomarara kwekutsvaga mashandiro anoshanda

Dambudziko guru ndere computational kuoma. Huwandu hwezvinogoneka hudiki uye husiri hudiki hwekutsamira hunogumira pamusoro nekukosha Nekugona tsvaga zvinoshanda zvinoenderana mumadatabasekupi Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase - nhamba yezvimiro zvetafura. Nguva yekushanda yealgorithms inotsamira kwete chete pahuwandu hwemaitiro, asiwo pahuwandu hwemitsara. Muma90s, federal law search algorithms pane yenguva dzose desktop PC inogona kugadzirisa data seti ine anosvika makumi maviri hunhu uye makumi ezviuru emitsara mumaawa akati wandei. Mazuva ano maalgorithms anomhanya pama-multi-core processors anoona kutsamira kwemaseti edata ane mazana ehunhu (kusvika ku20) uye mazana ezviuru zvemitsara munguva ingangoita imwe chete. Zvisinei, izvi hazvina kukwana: nguva yakadaro haigamuchirwi kune dzakawanda-chaiyo-yenyika mashandisirwo. Naizvozvo, takagadzira nzira dzekumhanyisa maalgorithms aripo.

Caching zvirongwa zvekuparadzanisa mharadzano

Muchikamu chekutanga chebasa, takagadzira caching zvirongwa zvekirasi yealgorithms inoshandisa nzira yekuparadzanisa nzira. Kupatsanurwa kwehunhu seti yezvinyorwa, apo runyorwa rwega rwega rune nhamba dzemitsara dzine hunhu hwakafanana hwechinhu chakapihwa. Ndaza imwe neimwe yakadaro inonzi cluster. Mazhinji maalgorithms emazuva ano anoshandisa partitions kuona kuti kutsamira kunobatwa here kana kuti kwete, kureva, vanoomerera kune lemma: Dependency. Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase akabata kana Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase. Here Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase chikamu chinosarudzwa uye pfungwa yekuparadzanisa saizi inoshandiswa - nhamba yemasumbu mairi. Algorithms anoshandisa partitions, kana kutsamira kwatyorwa, wedzera humwe hunhu kuruboshwe rwekutsamira, uye wozozviverengera zvakare, uchiita mashandiro ekupindirana kwezvikamu. Kuvhiya uku kunonzi nyanzvi mune zvinyorwa. Asi takaona kuti zvikamu zvekutsamira zvinozongochengetwa mushure menguva shoma yehunyanzvi zvinogona kushandiswa zvakare, izvo zvinogona kuderedza zvakanyanya nguva yekumhanya yealgorithms, sezvo kushanda kwemharadzano kuchidhura.

Naizvozvo, isu takakurudzira heuristic yakavakirwa paShannon Entropy uye Ginny Kusagadzikana, pamwe nemetric yedu, yatakadaidza Reverse Entropy. Iko kugadziridzwa kudiki kweShannon Entropy uye inowedzera sezvo kusarudzika kweseti yedata kunowedzera. Iyo yakatsanangurwa heuristic ndeiyi inotevera:

Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase

zviri Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase - dhigirii rekusiyana kwechikamu chichangobva kuverengerwa Nekugona tsvaga zvinoshanda zvinoenderana mumadatabaseuye Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase ndiyo yepakati yemadhigirii ekusiyana kwehunhu hwemunhu. Ese matatu metrics akatsanangurwa pamusoro akaedzwa seyakasarudzika metric. Iwe unogona zvakare kuona kuti kune maviri modifiers mune heuristic. Yekutanga inoratidza kuti chikamu chazvino chiri padyo sei kune kiyi yekutanga uye chinokutendera kuti uchengete kusvika pamwero mukuru iwo mapartitions ari kure nekiyi inogona kuitika. Yechipiri modifier inobvumidza iwe kutarisa cache kugara uye nekudaro inokurudzira kuwedzera zvimwe zvikamu kune cache kana nzvimbo yemahara iripo. Mhinduro yakabudirira yedambudziko iri yakatibvumira kukurumidzira PYRO algorithm ne10-40%, zvichienderana nedataset. Izvo zvakakosha kucherechedza kuti iyo PYRO algorithm ndiyo yakabudirira zvakanyanya munzvimbo ino.

Pamufananidzo uri pazasi unogona kuona mibairo yekushandisa iyo yakatsanangurwa heuristic uchienzaniswa neiyo yakakosha coin-flip caching maitiro. Iyo X axis ndeye logarithmic.

Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase

Imwe nzira yekuchengetedza zvikamu

Takabva taronga imwe nzira yekuchengeta mapartitions. Partitions seti yemasumbu, imwe neimwe inochengeta manhamba etuples ane akafanana hunhu kune humwe hunhu. Aya masumbu anogona kunge aine kutevedzana kwakareba kwenhamba dzetuple, semuenzaniso kana data iri mutafura yakarairwa. Naizvozvo, isu takaronga chirongwa chekumanikidza chekuchengetedza zvikamu, zvinoti nguva yekuchengetera kukosha mumasumbu ezvikamu:

$$display$$pi(X) = {{underbrace{1, 2, 3, 4, 5}_{First interval}, underbrace{7, 8}_{Second interval}, 10}}\ downarrow{ Compression} \ pi(X) = {{underbrace{$, 1, 5}_{First~interval}, underbrace{7, 8}_{Second~interval}, 10}}$$ratidza$$

Iyi nzira yakakwanisa kuderedza kushandiswa kwekuyeuka panguva yekushanda kweTANE algorithm kubva pa1 kusvika ku25%. Iyo TANE algorithm ndeye classic algorithm yekutsvaga mitemo yemubatanidzwa; inoshandisa zvikamu panguva yebasa rayo. Sechikamu chemuitiro, TANE algorithm yakasarudzwa, sezvo yakanga iri nyore kushandisa nguva yekuchengetedza mukati mayo kupfuura, semuenzaniso, muPYRO kuitira kuti uongorore kana nzira yakarongwa inoshanda. Zvigumisiro zvakawanikwa zvinoratidzwa mumufananidzo uri pasi apa. Iyo X axis ndeye logarithmic.

Nekugona tsvaga zvinoshanda zvinoenderana mumadatabase

Musangano ADBIS-2019

Zvichienderana nemhedzisiro yekutsvagisa, munaGunyana 2019 ndakaburitsa chinyorwa Smart Caching yeInoshanda Inoshanda Kutsamira Kuwanikwa pa 23rd European Musangano weAdvances muDatabases uye Information Systems (ADBIS-2019). Munguva yemharidzo, basa rakacherechedzwa naBernhard Thalheim, munhu akakosha mumunda wedatabase. Mhedzisiro yetsvakiridzo yakaumba hwaro hwedissertation yangu padhigirii raTenzi mune masvomhu uye makanika paSt. Uyezve, zvigumisiro zvakaratidza kuti nzira dzakarongwa dziri pasi rose, sezvo pamatanho ese ari maviri, nemaitiro ose, kuderedzwa kukuru kwekushandiswa kwendangariro kwakaonekwa, pamwe nekuderera kukuru munguva yekushanda yealgorithms.

Source: www.habr.com

Voeg