Ana amfani da gano abubuwan dogaro na aiki a cikin bayanai a fannoni daban-daban na nazarin bayanai: sarrafa bayanai, tsaftace bayanai, injiniyan juyar da bayanai da kuma binciken bayanai. Mun riga mun buga game da dogara da kansu
Zaɓin ɗawainiya
Yayin da nake karatu a cibiyar CS, na fara nazarin bayanan bayanai a cikin zurfi, wato, neman aiki da bambancin dogaro. Wannan batu yana da alaƙa da batun aikin kwasa-kwasan da nake yi a jami’a, don haka a lokacin da nake aikin kwasa-kwasan, na fara karanta kasidu game da dogaro da yawa a cikin bayanai. Na rubuta sharhin wannan yanki - ɗaya daga cikin na farko
A lokacin semester dina na biyu a cibiyar, na fara aikin bincike don inganta algorithms don gano abubuwan dogaro na aiki. Ta yi aiki a kai tare da Jami'ar Jihar St. Petersburg Nikita Bobrov wanda ya kammala karatun digiri a JetBrains Research.
Halin lissafi na neman abubuwan dogaro na aiki
Babban matsalar ita ce rikitarwar lissafi. Adadin yuwuwar dogaro mafi ƙanƙanta da mara ƙarancin ƙima yana iyakance sama da ƙimar inda - adadin halaye na tebur. Lokacin aiki na algorithms ya dogara ba kawai akan adadin halayen ba, har ma akan adadin layuka. A cikin 90s, algorithms binciken dokar tarayya akan PC na yau da kullun na iya aiwatar da saitin bayanai waɗanda ke ɗauke da halayen har zuwa 20 da dubun dubatan layuka a cikin sa'o'i da yawa. Algorithms na zamani da ke aiki akan na'urori masu sarrafawa da yawa suna gano abubuwan dogaro ga saitin bayanai wanda ya ƙunshi ɗaruruwan halaye (har zuwa 200) da ɗaruruwan dubunnan layuka a kusan lokaci guda. Duk da haka, wannan bai isa ba: irin wannan lokacin ba shi da karbuwa ga yawancin aikace-aikace na ainihi. Saboda haka, mun haɓaka hanyoyin da za a hanzarta algorithms data kasance.
Tsare-tsare na caching don mahaɗar ɓangarori
A kashi na farko na aikin, mun ƙirƙiri tsare-tsaren caching don aji na algorithms waɗanda ke amfani da hanyar tsaka-tsaki. Bangare don sifa jerin jeri ne, inda kowane jeri ya ƙunshi lambobin layi tare da ƙima iri ɗaya don sifa da aka bayar. Kowane irin wannan jeri ana kiransa tari. Yawancin algorithms na zamani suna amfani da ɓangarori don tantance ko ana riƙe abin dogaro ko a'a, wato, suna bin lemma: Dogara. rike idan . nan an tsara wani bangare kuma ana amfani da manufar girman bangare - adadin gungu a ciki. Algorithms da ke amfani da ɓangarori, lokacin da aka keta abin dogaro, ƙara ƙarin sifofi zuwa gefen hagu na abin dogaro, sannan sake ƙididdige shi, yana aiwatar da aikin tsaka-tsaki na ɓangarori. Ana kiran wannan aiki na musamman a cikin labaran. Amma mun lura cewa ɓangarori don abubuwan dogaro waɗanda za a iya riƙe su kawai bayan ƴan zagaye na ƙwarewa za a iya sake amfani da su sosai, wanda zai iya rage lokacin gudu na algorithms, tunda aikin haɗin gwiwa yana da tsada.
Saboda haka, mun ba da shawarar heuristic dangane da Shannon Entropy da Ginny Uncertainty, da ma'aunin mu, wanda muka kira Reverse Entropy. Yana da ɗan gyare-gyare na Shannon Entropy kuma yana ƙaruwa yayin da keɓancewar saitin bayanai ke ƙaruwa. Shirin heuristic da aka gabatar shine kamar haka:
Yana da - digiri na musamman na ɓangaren da aka ƙididdige kwanan nan da kuma shi ne matsakaicin ma'auni na keɓantacce don halayen mutum ɗaya. Dukkan ma'auni guda uku da aka kwatanta a sama an gwada su azaman ma'auni na musamman. Hakanan zaka iya lura cewa akwai masu gyara guda biyu a cikin heuristic. Na farko yana nuna yadda kusancin bangare na yanzu yake zuwa maɓalli na farko kuma yana ba ku damar adana har zuwa mafi girman ɓangarorin waɗanda ke nesa da maɓallin yuwuwar. Mai gyara na biyu yana ba ku damar saka idanu kan zama kuma ta haka yana ƙarfafa ƙara ƙarin ɓangarori zuwa cache idan akwai sarari kyauta. Maganin nasara na wannan matsala ya ba mu damar hanzarta PYRO algorithm ta 10-40%, dangane da bayanan bayanan. Ya kamata a lura cewa PYRO algorithm shine mafi nasara a wannan yanki.
A cikin hoton da ke ƙasa zaku iya ganin sakamakon yin amfani da heuristic da aka tsara idan aka kwatanta da ainihin hanyar caching tsabar tsabar kudi. Axis X shine logarithmic.
Wata hanyar da za a adana ɓangarori
Daga nan mun ba da shawarar wata hanya ta daban don adana ɓangarori. Bangare saitin gungu ne, kowannensu yana adana lambobi na tuples masu ƙima iri ɗaya don wasu halaye. Waɗannan gungu na iya ƙunsar dogayen jeri na lambobi, misali idan an yi odar bayanan da ke cikin tebur. Don haka, mun ba da shawarar tsarin matsawa don adana ɓangarori, wato tazarar ajiyar ƙima a cikin gungu na ɓangarori:
$$ nuni $$pi(X) = {{ƙarƙashin takalmin gyaran kafa{1, 2, 3, 4, 5}_{Tazarar farko}, ƙarƙashin takalmin gyaran kafa{7, 8}_{Tazara ta biyu}, 10}} \ ƙasa{ matsawa} \ pi(X) = {{ƙarƙashin takalmin gyaran kafa{$, 1, 5}_{Tazarar Farko}, Ƙarƙashin takalmin gyaran kafa{7, 8}_{Tazara ta Biyu}, 10}}$$ nunawa$$
Wannan hanyar ta sami damar rage yawan ƙwaƙwalwar ajiya yayin aikin TANE algorithm daga 1 zuwa 25%. Algorithm na TANE algorithm ne na yau da kullun don neman dokokin tarayya; yana amfani da bangare yayin aikinsa. A matsayin wani ɓangare na aikin, an zaɓi TANE algorithm, tun da yake ya fi sauƙi don aiwatar da ajiyar tazara a ciki fiye da, alal misali, a cikin PYRO don kimanta ko tsarin da aka tsara yana aiki. An gabatar da sakamakon da aka samu a cikin hoton da ke ƙasa. Axis X shine logarithmic.
Taron ADBIS-2019
Dangane da sakamakon binciken, a watan Satumba na 2019 na buga labarin
source: www.habr.com