Nemo dogaron aiki da kyau a cikin bayanan bayanai

Ana amfani da gano abubuwan dogaro na aiki a cikin bayanai a fannoni daban-daban na nazarin bayanai: sarrafa bayanai, tsaftace bayanai, injiniyan juyar da bayanai da kuma binciken bayanai. Mun riga mun buga game da dogara da kansu labarin Anastasia Birillo da Nikita Bobrov. A wannan karon, Anastasia, wacce ta kammala digiri a Cibiyar Kimiyyar Kwamfuta a wannan shekarar, ta ba da labarin ci gaban wannan aikin a matsayin wani bangare na aikin bincike da ta kare a cibiyar.

Nemo dogaron aiki da kyau a cikin bayanan bayanai

Zaɓin ɗawainiya

Yayin da nake karatu a cibiyar CS, na fara nazarin bayanan bayanai a cikin zurfi, wato, neman aiki da bambancin dogaro. Wannan batu yana da alaƙa da batun aikin kwasa-kwasan da nake yi a jami’a, don haka a lokacin da nake aikin kwasa-kwasan, na fara karanta kasidu game da dogaro da yawa a cikin bayanai. Na rubuta sharhin wannan yanki - ɗaya daga cikin na farko labarai a cikin Ingilishi kuma an ƙaddamar da shi zuwa taron SEIM-2017. Na yi farin ciki sosai lokacin da na gano cewa an yarda da ita bayan haka, kuma na yanke shawarar zurfafa cikin batun. Manufar kanta ba sabon abu ba ne - an fara amfani dashi a cikin 90s, amma har yanzu ana amfani da shi a wurare da yawa.

A lokacin semester dina na biyu a cibiyar, na fara aikin bincike don inganta algorithms don gano abubuwan dogaro na aiki. Ta yi aiki a kai tare da Jami'ar Jihar St. Petersburg Nikita Bobrov wanda ya kammala karatun digiri a JetBrains Research.

Halin lissafi na neman abubuwan dogaro na aiki

Babban matsalar ita ce rikitarwar lissafi. Adadin yuwuwar dogaro mafi ƙanƙanta da mara ƙarancin ƙima yana iyakance sama da ƙimar Nemo dogaron aiki da kyau a cikin bayanan bayanaiinda Nemo dogaron aiki da kyau a cikin bayanan bayanai - adadin halaye na tebur. Lokacin aiki na algorithms ya dogara ba kawai akan adadin halayen ba, har ma akan adadin layuka. A cikin 90s, algorithms binciken dokar tarayya akan PC na yau da kullun na iya aiwatar da saitin bayanai waɗanda ke ɗauke da halayen har zuwa 20 da dubun dubatan layuka a cikin sa'o'i da yawa. Algorithms na zamani da ke aiki akan na'urori masu sarrafawa da yawa suna gano abubuwan dogaro ga saitin bayanai wanda ya ƙunshi ɗaruruwan halaye (har zuwa 200) da ɗaruruwan dubunnan layuka a kusan lokaci guda. Duk da haka, wannan bai isa ba: irin wannan lokacin ba shi da karbuwa ga yawancin aikace-aikace na ainihi. Saboda haka, mun haɓaka hanyoyin da za a hanzarta algorithms data kasance.

Tsare-tsare na caching don mahaɗar ɓangarori

A kashi na farko na aikin, mun ƙirƙiri tsare-tsaren caching don aji na algorithms waɗanda ke amfani da hanyar tsaka-tsaki. Bangare don sifa jerin jeri ne, inda kowane jeri ya ƙunshi lambobin layi tare da ƙima iri ɗaya don sifa da aka bayar. Kowane irin wannan jeri ana kiransa tari. Yawancin algorithms na zamani suna amfani da ɓangarori don tantance ko ana riƙe abin dogaro ko a'a, wato, suna bin lemma: Dogara. Nemo dogaron aiki da kyau a cikin bayanan bayanai rike idan Nemo dogaron aiki da kyau a cikin bayanan bayanai. nan Nemo dogaron aiki da kyau a cikin bayanan bayanai an tsara wani bangare kuma ana amfani da manufar girman bangare - adadin gungu a ciki. Algorithms da ke amfani da ɓangarori, lokacin da aka keta abin dogaro, ƙara ƙarin sifofi zuwa gefen hagu na abin dogaro, sannan sake ƙididdige shi, yana aiwatar da aikin tsaka-tsaki na ɓangarori. Ana kiran wannan aiki na musamman a cikin labaran. Amma mun lura cewa ɓangarori don abubuwan dogaro waɗanda za a iya riƙe su kawai bayan ƴan zagaye na ƙwarewa za a iya sake amfani da su sosai, wanda zai iya rage lokacin gudu na algorithms, tunda aikin haɗin gwiwa yana da tsada.

Saboda haka, mun ba da shawarar heuristic dangane da Shannon Entropy da Ginny Uncertainty, da ma'aunin mu, wanda muka kira Reverse Entropy. Yana da ɗan gyare-gyare na Shannon Entropy kuma yana ƙaruwa yayin da keɓancewar saitin bayanai ke ƙaruwa. Shirin heuristic da aka gabatar shine kamar haka:

Nemo dogaron aiki da kyau a cikin bayanan bayanai

Yana da Nemo dogaron aiki da kyau a cikin bayanan bayanai - digiri na musamman na ɓangaren da aka ƙididdige kwanan nan Nemo dogaron aiki da kyau a cikin bayanan bayanaida kuma Nemo dogaron aiki da kyau a cikin bayanan bayanai shi ne matsakaicin ma'auni na keɓantacce don halayen mutum ɗaya. Dukkan ma'auni guda uku da aka kwatanta a sama an gwada su azaman ma'auni na musamman. Hakanan zaka iya lura cewa akwai masu gyara guda biyu a cikin heuristic. Na farko yana nuna yadda kusancin bangare na yanzu yake zuwa maɓalli na farko kuma yana ba ku damar adana har zuwa mafi girman ɓangarorin waɗanda ke nesa da maɓallin yuwuwar. Mai gyara na biyu yana ba ku damar saka idanu kan zama kuma ta haka yana ƙarfafa ƙara ƙarin ɓangarori zuwa cache idan akwai sarari kyauta. Maganin nasara na wannan matsala ya ba mu damar hanzarta PYRO algorithm ta 10-40%, dangane da bayanan bayanan. Ya kamata a lura cewa PYRO algorithm shine mafi nasara a wannan yanki.

A cikin hoton da ke ƙasa zaku iya ganin sakamakon yin amfani da heuristic da aka tsara idan aka kwatanta da ainihin hanyar caching tsabar tsabar kudi. Axis X shine logarithmic.

Nemo dogaron aiki da kyau a cikin bayanan bayanai

Wata hanyar da za a adana ɓangarori

Daga nan mun ba da shawarar wata hanya ta daban don adana ɓangarori. Bangare saitin gungu ne, kowannensu yana adana lambobi na tuples masu ƙima iri ɗaya don wasu halaye. Waɗannan gungu na iya ƙunsar dogayen jeri na lambobi, misali idan an yi odar bayanan da ke cikin tebur. Don haka, mun ba da shawarar tsarin matsawa don adana ɓangarori, wato tazarar ajiyar ƙima a cikin gungu na ɓangarori:

$$ nuni $$pi(X) = {{ƙarƙashin takalmin gyaran kafa{1, 2, 3, 4, 5}_{Tazarar farko}, ƙarƙashin takalmin gyaran kafa{7, 8}_{Tazara ta biyu}, 10}} \ ƙasa{ matsawa} \ pi(X) = {{ƙarƙashin takalmin gyaran kafa{$, 1, 5}_{Tazarar Farko}, Ƙarƙashin takalmin gyaran kafa{7, 8}_{Tazara ta Biyu}, 10}}$$ nunawa$$

Wannan hanyar ta sami damar rage yawan ƙwaƙwalwar ajiya yayin aikin TANE algorithm daga 1 zuwa 25%. Algorithm na TANE algorithm ne na yau da kullun don neman dokokin tarayya; yana amfani da bangare yayin aikinsa. A matsayin wani ɓangare na aikin, an zaɓi TANE algorithm, tun da yake ya fi sauƙi don aiwatar da ajiyar tazara a ciki fiye da, alal misali, a cikin PYRO don kimanta ko tsarin da aka tsara yana aiki. An gabatar da sakamakon da aka samu a cikin hoton da ke ƙasa. Axis X shine logarithmic.

Nemo dogaron aiki da kyau a cikin bayanan bayanai

Taron ADBIS-2019

Dangane da sakamakon binciken, a watan Satumba na 2019 na buga labarin Smart Caching don Ingantacciyar Gano Dogaran Aiki a taron na 23 na Turai kan Ci gaba a cikin Databases da Information Systems (ADBIS-2019). A lokacin gabatar da aikin, Bernhard Thalheim, wani muhimmin mutum a fagen bayanan bayanai ya lura da aikin. Sakamakon binciken ya kafa tushen karatuna a digiri na biyu a fannin lissafi da injiniyoyi a Jami'ar Jihar St. Bugu da ƙari, sakamakon ya nuna cewa hanyoyin da aka tsara sun kasance na duniya, tun da a kan dukkanin algorithms, tare da hanyoyi guda biyu, an sami raguwa mai yawa a cikin amfani da ƙwaƙwalwar ajiya, da kuma raguwa mai mahimmanci a lokacin aiki na algorithms.

source: www.habr.com

Add a comment