Yadda Databases Aiki (Sashe na 1)

Hello, Habr! Ina gabatar muku da fassarar labarin
"Yaya dangantakar bayanai ke aiki".

Lokacin da ya zo ga bayanan bayanai na dangantaka ba zan iya taimakawa ba sai dai tunanin wani abu ya ɓace. Ana amfani da su a ko'ina. Akwai mabambantan bayanai da yawa da ake samu, daga ƙarami da amfani SQLite zuwa Teradata mai ƙarfi. Amma akwai ƴan labarai kaɗan waɗanda ke bayyana yadda rumbun adana bayanai ke aiki. Kuna iya nemo kanku ta amfani da "howdoesarelationaldatabasework" don ganin sakamakon kaɗan kaɗan. Bugu da ƙari, waɗannan labaran gajeru ne. Idan kuna neman sabbin fasahohin buzzy (BigData, NoSQL ko JavaScript), zaku sami ƙarin labarai masu zurfi waɗanda ke bayanin yadda suke aiki.

Shin bayanan da suka danganci bayanai sun tsufa kuma suna da ban sha'awa don bayyana su a wajen darussan jami'a, takaddun bincike da littattafai?

Yadda Databases Aiki (Sashe na 1)

A matsayina na mai haɓakawa, na ƙi amfani da abin da ban fahimta ba. Kuma idan an yi amfani da bayanan bayanan sama da shekaru 40, dole ne a sami dalili. A cikin shekaru da yawa, Na yi amfani da ɗarurruwan sa'o'i don fahimtar waɗannan baƙaƙen akwatunan baƙar fata waɗanda nake amfani da su kowace rana. Bayanai na dangantaka ban sha'awa sosai saboda su bisa ga dabaru masu amfani da sake amfani da su. Idan kuna sha'awar fahimtar bayanan bayanai, amma ba ku taɓa samun lokaci ko sha'awar shiga cikin wannan faffadan batu ba, ya kamata ku ji daɗin wannan labarin.

Ko da yake taken wannan labarin ya fito fili. makasudin wannan labarin ba shine fahimtar yadda ake amfani da bayanan ba. Don haka, ya kamata ka riga ka san yadda ake rubuta buƙatun haɗi mai sauƙi da tambayoyin asali RAW; in ba haka ba za ku iya fahimtar wannan labarin. Wannan shine kawai abin da kuke buƙatar sani, zan bayyana sauran.

Zan fara da wasu mahimman abubuwan kimiyyar kwamfuta, irin su rikitaccen lokaci na algorithms (BigO). Na san wasunku suna ƙin wannan ra'ayi, amma idan ba tare da shi ba ba za ku iya fahimtar abubuwan da ke cikin ma'ajin bayanai ba. Tunda wannan babban batu ne, Zan maida hankali akai abin da nake ganin yana da mahimmanci: yadda tsarin bayanai ke aiwatarwa SQL bincike. Zan gabatar kawai asali database Conceptsdon haka a ƙarshen labarin kuna da ra'ayin abin da ke faruwa a ƙarƙashin hular.

Tun da wannan labari ne mai tsawo da fasaha wanda ya ƙunshi yawancin algorithms da tsarin bayanai, ɗauki lokacin ku don karanta ta. Wasu ra'ayoyi na iya zama da wahala a fahimta; za ku iya tsallake su kuma har yanzu kuna samun ra'ayi na gaba ɗaya.

Ga masu ilimi a cikinku, wannan labarin ya kasu kashi 3:

  • Bayyani na ƙananan matakai da manyan abubuwan haɗin bayanan bayanai
  • Bayanin Tsarin Haɓaka Tambaya
  • Bayanin Ma'amala da Gudanar da Pool Pool

Komawa ga Basics

Shekaru da suka gabata (a cikin galaxy mai nisa, nisa ...), masu haɓakawa dole ne su san ainihin adadin ayyukan da suke yi. Sun san algorithms da tsarin bayanan su da zuciya saboda ba za su iya yin asarar CPU da ƙwaƙwalwar ajiyar kwamfutocin su ba.

A cikin wannan ɓangaren, zan tunatar da ku wasu daga cikin waɗannan ra'ayoyin saboda suna da mahimmanci don fahimtar bayanan bayanai. Zan kuma gabatar da ra'ayi bayanan bayanai.

O(1) vs O(n2)

A zamanin yau, yawancin masu haɓakawa ba su damu da wahalar lokaci na algorithms ba ... kuma sun yi daidai!

Amma lokacin da kuke ma'amala da bayanai da yawa (bana magana dubbai) ko kuma idan kuna gwagwarmaya a cikin millise seconds, yana da mahimmanci don fahimtar wannan ra'ayi. Kuma kamar yadda zaku iya tunanin, bayanan bayanai dole ne su magance yanayin biyu! Ba zan ba ku ƙarin lokaci fiye da buƙata don samun ainihin abin ba. Wannan zai taimaka mana mu fahimci manufar inganta ƙimar kuɗi daga baya (kudin tushen ingantawa).

Tunani

Halin lokaci na algorithm ana amfani da shi don ganin tsawon lokacin da algorithm zai ɗauka don kammalawa ga adadin da aka bayar. Don kwatanta wannan sarƙaƙƙiya, muna amfani da babban bayanin lissafin O. Ana amfani da wannan bayanin tare da aikin da ke bayyana yawan ayyukan da algorithm ke buƙata don adadin abubuwan da aka bayar.

Misali, idan na ce "wannan algorithm yana da hadaddun O(some_function())", yana nufin algorithm yana buƙatar wasu ayyuka (a_certain_amount_of_data) don aiwatar da takamaiman adadin bayanai.

Kamar wancan Ba adadin bayanai bane ke da mahimmanci**, in ba haka ba ** yadda adadin ayyuka ke ƙaruwa tare da ƙara ƙarar bayanai. Rikicin lokaci baya samar da ainihin adadin ayyuka, amma hanya ce mai kyau don kimanta lokacin aiwatarwa.

Yadda Databases Aiki (Sashe na 1)

A cikin wannan jadawali za ku iya ganin adadin ayyuka tare da adadin bayanan shigarwa don nau'ikan hadaddun lokacin algorithm daban-daban. Na yi amfani da ma'aunin logarithmic don nuna su. A takaice dai, adadin bayanai da sauri yana ƙaruwa daga biliyan 1 zuwa biliyan 1. Za mu iya ganin cewa:

  • O(1) ko rikitaccen abu ya kasance dawwama (in ba haka ba ba za'a kira shi akai-akai).
  • O(shiga(n)) ya rage koda da biliyoyin bayanai.
  • Mafi munin wahala - O(n2), inda adadin ayyuka ke girma cikin sauri.
  • Sauran rikice-rikice biyu suna ƙaruwa daidai da sauri.

misalai

Tare da ƙaramin adadin bayanai, bambanci tsakanin O (1) da O (n2) ba shi da komai. Misali, bari mu ce kuna da algorithm wanda ke buƙatar aiwatar da abubuwa 2000.

  • Algorithm din O(1) zai kashe muku aiki 1
  • O(log(n)) algorithm zai kashe muku ayyuka 7
  • Algorithm din O(n) zai kashe muku ayyuka 2
  • Algorithm din O(n*log(n)) zai kashe muku ayyuka 14
  • Algorithm din O(n2) zai kashe muku ayyuka 4

Bambanci tsakanin O(1) da O(n2) yana da girma (ayyukan ayyuka miliyan 4) amma zaku rasa iyakar 2 ms, kawai lokacin da za ku lumshe idanu. Lallai, na'urori na zamani na iya sarrafawa daruruwan miliyoyin ayyuka a sakan daya. Wannan shine dalilin da ya sa aiki da haɓakawa ba batun bane a yawancin ayyukan IT.

Kamar yadda na ce, har yanzu yana da mahimmanci a san wannan ra'ayi yayin aiki tare da adadi mai yawa na bayanai. Idan wannan lokacin algorithm dole ne ya aiwatar da abubuwa 1 (wanda ba haka ba ne don bayanan bayanai):

  • Algorithm din O(1) zai kashe muku aiki 1
  • O(log(n)) algorithm zai kashe muku ayyuka 14
  • Algorithm din O(n) zai kashe muku ayyuka 1
  • O(n*log(n)) algorithm zai kashe muku ayyuka 14
  • Algorithm din O(n2) zai kashe muku ayyuka 1

Ban yi lissafi ba, amma zan ce tare da O (n2) algorithm kuna da lokacin shan kofi (ko da biyu!). Idan kun ƙara wani 0 zuwa ƙarar bayanai, za ku sami lokacin yin hutu.

Mu zurfafa

Don tunani:

  • Kyakkyawan duba tebur na zanta ya sami wani abu a cikin O(1).
  • Neman madaidaicin bishiyar yana samar da sakamako a cikin O(log(n)).
  • Neman tsararru yana samar da sakamako a cikin O(n).
  • Mafi kyawun rarrabuwa algorithms suna da hadaddun O(n*log(n)).
  • Algorithm ɗin mara kyau yana da rikitarwa O(n2).

Lura: A cikin waɗannan sassa za mu ga waɗannan algorithms da tsarin bayanai.

Akwai nau'ikan nau'ikan rikitarwa na lokacin algorithm:

  • matsakaicin yanayin yanayin
  • mafi kyawun yanayin yanayin
  • kuma mafi munin yanayi

Rikicin lokaci sau da yawa shine mafi munin yanayi.

Ina magana ne kawai game da wahalar lokaci na algorithm, amma rikitarwa kuma ya shafi:

  • amfani da ƙwaƙwalwar ajiya na algorithm
  • faifai I/O amfani algorithm

Tabbas, akwai rikitarwa fiye da n2, misali:

  • n4: wannan mugun! Wasu daga cikin algorithms da aka ambata suna da wannan rikitarwa.
  • 3n: wannan ma yafi muni! Ɗaya daga cikin algorithms da za mu gani a tsakiyar wannan labarin yana da wannan rikitarwa (kuma ana amfani dashi a cikin ɗakunan bayanai da yawa).
  • factorial n: Ba za ku taɓa samun sakamakonku ba koda da ƙaramin adadin bayanai.
  • nn: Idan kun ci karo da wannan sarkakiyar, ya kamata ku tambayi kanku shin da gaske wannan fannin aikinku ne...

Lura: Ban ba ku ainihin ma'anar babban sunan O ba, ra'ayi kawai. Kuna iya karanta wannan labarin a Wikipedia don ainihin ma'anar (asymptotic).

Haɗa Tsari

Me kuke yi lokacin da kuke buƙatar tsara tarin? Menene? Kuna kiran nau'in () aikin ... Ok, amsa mai kyau ... Amma don bayanan bayanai, dole ne ku fahimci yadda wannan nau'in () aikin yake aiki.

Akwai algorithms masu kyau da yawa, don haka zan mayar da hankali kan mafi mahimmanci: hade iri. Wataƙila ba za ku fahimci dalilin da yasa rarraba bayanai ke da amfani a yanzu ba, amma ya kamata ku bayan ɓangaren haɓakar tambaya. Bugu da ƙari, fahimtar nau'in haɗin kai zai taimaka mana daga baya fahimtar aikin haɗin yanar gizon gama gari da ake kira tafi shiga (haɗin kai).

Haɗa

Kamar yawancin algorithms masu amfani, hade da jerin nau'ikan girman: hada 2 daban-daban na girman kai na N / 2 cikin n-kashi jere. Ana kiran wannan aiki tare.

Bari mu ga abin da wannan ke nufi da misali mai sauƙi:

Yadda Databases Aiki (Sashe na 1)

Wannan adadi yana nuna cewa don gina tsararrun tsararrun abubuwa 8 na ƙarshe, kuna buƙatar maimaita sau ɗaya kawai akan tsararrun abubuwa 2 4. Tun da an riga an jera dukkan jigogi guda 4:

  • 1) kuna kwatanta abubuwa biyu na yanzu a cikin tsararraki biyu (a farkon halin yanzu = farko)
  • 2) sai a dauki mafi karami a sanya shi cikin jerin abubuwa guda 8
  • 3) kuma matsa zuwa kashi na gaba a cikin tsararru inda kuka ɗauki mafi ƙarancin kashi
  • kuma maimaita 1,2,3 har sai kun isa ƙarshen kashi na ɗaya daga cikin tsararrun.
  • Sa'an nan kuma ku ɗauki sauran abubuwan da suka rage na sauran tsararrun don sanya su a cikin tsari guda 8.

Wannan yana aiki saboda an jera nau'ikan abubuwa 4 guda biyu don haka ba dole ba ne ku "koma" a cikin waɗannan tsararrun.

Yanzu da muka fahimci dabarar, ga pseudocode dina don haɗawa:

array mergeSort(array a)
   if(length(a)==1)
      return a[0];
   end if

   //recursive calls
   [left_array right_array] := split_into_2_equally_sized_arrays(a);
   array new_left_array := mergeSort(left_array);
   array new_right_array := mergeSort(right_array);

   //merging the 2 small ordered arrays into a big one
   array result := merge(new_left_array,new_right_array);
   return result;

Haɗin kai yana karya matsala zuwa ƙananan matsaloli sannan a sami sakamakon ƙananan matsalolin don samun sakamakon ainihin matsalar (bayanin kula: wannan nau'in algorithm ana kiransa rabawa da nasara). Idan ba ku fahimci wannan algorithm ba, kada ku damu; Ban gane shi ba a karon farko da na gan shi. Idan zai iya taimaka muku, Ina ganin wannan algorithm azaman algorithm mai matakai biyu:

  • Lokaci na rarraba, inda aka raba jeri zuwa ƙananan tsararru
  • Lokacin rarrabuwa shine inda ake haɗa ƙananan jeri-jeri (ta amfani da ƙungiyar) don samar da babban tsari.

Matakin rabuwa

Yadda Databases Aiki (Sashe na 1)

A cikin matakin rarraba, an raba tsararru zuwa tsararraki na yanki cikin matakai 3. Adadin matakan matakai shine log(N) (tun N=8, log(N) = 3).

Ta yaya zan san wannan?

Ni baiwa ce! A cikin kalma - lissafi. Manufar ita ce kowane mataki yana raba girman asalin tsararru ta 2. Adadin matakan shine adadin lokutan da zaku iya raba asalin tsararru zuwa biyu. Wannan shine ainihin ma'anar logarithm (tushe 2).

Lokacin tsarawa

Yadda Databases Aiki (Sashe na 1)

A lokacin rarrabuwar kawuna, zaku fara da tsararraki na haɗin kai (bangare-ɗaya). Yayin kowane mataki kuna amfani da ayyukan haɗin gwiwa da yawa kuma jimlar kuɗin shine N = 8 ayyuka:

  • A mataki na farko kana da 4 merges cewa kudin 2 ayyuka kowane
  • A mataki na biyu kuna da haɗin gwiwa 2 wanda farashin ayyuka 4 kowanne
  • A mataki na uku kuna da haɗin gwiwa 1 wanda ke biyan ayyuka 8

Tunda akwai matakan log(N), jimlar kudin N * log(N) aiki.

Amfanin nau'in haɗakarwa

Me yasa wannan algorithm ke da ƙarfi sosai?

Domin:

  • Kuna iya canza shi don rage sawun ƙwaƙwalwar ajiya don kada ku ƙirƙiri sabbin jeri amma gyara tsararrun shigarwa kai tsaye.

Lura: ana kiran irin wannan nau'in algorithm in-wuri (Rarraba ba tare da ƙarin ƙwaƙwalwar ajiya ba).

  • Kuna iya canza shi don amfani da sararin faifai da ƙaramin adadin ƙwaƙwalwar ajiya a lokaci guda ba tare da jawo babban faifai I/O sama ba. Manufar ita ce a loda cikin ƙwaƙwalwar ajiya kawai waɗanda sassan da ake sarrafa su a halin yanzu. Wannan yana da mahimmanci lokacin da kake buƙatar ware tebur mai yawan gigabyte tare da buffer megabyte 100 kawai.

Lura: ana kiran irin wannan nau'in algorithm irin na waje.

  • Kuna iya canza shi don aiki akan matakai / zaren / sabobin.

Misali, nau'in haɗakarwa da aka rarraba shine ɗayan mahimman abubuwan haɗin gwiwa Hadoop (wanda shine tsari a cikin manyan bayanai).

  • Wannan algorithm na iya juya gubar zuwa zinari (da gaske!).

Ana amfani da wannan rarrabuwa algorithm a mafi yawan (idan ba duka ba) rumbun adana bayanai, amma ba shine kaɗai ba. Idan kuna son ƙarin sani, kuna iya karanta wannan aikin bincike, wanda ke tattauna fa'idodi da rashin amfani na gama-gari na rarrabuwar bayanai.

Array, Tree da Hash Tebur

Yanzu da muka fahimci ra'ayin wahalar lokaci da rarrabawa, ya kamata in gaya muku game da tsarin bayanai guda 3. Wannan yana da mahimmanci saboda su su ne ginshiƙi na zamani bayanai. Zan kuma gabatar da ra'ayi bayanan bayanai.

Array

Tsari mai girma biyu shine mafi sauƙin tsarin bayanai. Ana iya tunanin tebur a matsayin tsararru. Misali:

Yadda Databases Aiki (Sashe na 1)

Wannan jeri mai girma 2 tebur ne mai layuka da ginshiƙai:

  • Kowane layi yana wakiltar mahalli
  • ginshiƙai suna adana kaddarorin da ke bayyana mahallin.
  • Kowane shafi yana adana bayanai na takamaiman nau'in ( lamba, kirtani, kwanan wata...).

Wannan ya dace don adanawa da hangen nesa bayanai, duk da haka, lokacin da kuke buƙatar nemo takamaiman ƙima, wannan bai dace ba.

Misali, idan kuna son nemo duk samarin da ke aiki a Burtaniya, kuna buƙatar duba kowane layi don sanin ko wannan layin na Burtaniya ne. Zai kashe ku N ma'amaloliinda N - adadin layi, wanda ba shi da kyau, amma za a iya samun hanya mafi sauri? Yanzu lokaci ya yi da za mu saba da bishiyoyi.

Lura: Yawancin ma'ajin bayanai na zamani suna ba da tsawaita jeri don adana tebur yadda ya kamata: tara-tsara da tebur-tsara. Amma wannan baya canza matsalar da sauri gano takamaiman yanayi a cikin rukuni na ginshiƙai.

Database itace da index

Bishiyar neman binaryar bishiyar binaryar bishiyar ce tare da dukiya ta musamman, maɓalli a kowane kumburi dole ne:

  • fiye da duk maɓallan da aka adana a cikin bishiyar hagu
  • kasa da duk maɓallan da aka adana a cikin maɓalli na dama

Bari mu ga abin da wannan ke nufi a gani

Idea

Yadda Databases Aiki (Sashe na 1)

Wannan bishiyar tana da N = abubuwa 15. Bari mu ce ina neman 208:

  • Na fara a tushen wanda maɓalli yake 136. Tun 136<208, Ina duban dama subtree na node 136.
  • 398>208 don haka ina kallon gefen hagu na node 398
  • 250>208 don haka ina kallon gefen hagu na node 250
  • 200<208, saboda haka ina kallon madaidaicin subtree na node 200. Amma 200 ba shi da hakki. darajar ba ta wanzu (saboda idan akwai, zai kasance a cikin madaidaiciyar bishiyar 200).

Yanzu bari a ce ina neman 40

  • Na fara a tushen wanda maɓalli yake 136. Tun 136> 40, na dubi hagu subtree na node 136.
  • 80> 40, don haka ina kallon gefen hagu na node 80
  • 40= 40, kumburi akwai. Na dawo da ID na jere a cikin kumburin (ba a nuna a hoton ba) kuma duba cikin tebur don ID na jere da aka bayar.
  • Sanin ID ɗin layi yana ba ni damar sanin ainihin inda bayanan ke cikin tebur, don haka zan iya dawo da su nan take.

A ƙarshe, duka binciken biyu zai kashe ni adadin matakan da ke cikin bishiyar. Idan ka karanta ɓangaren game da nau'in haɗawa a hankali, ya kamata ka ga cewa akwai matakan log(N). Ya bayyana, rajistan kudin nema (N), ba sharri!

Mu koma kan matsalarmu

Amma wannan abu ne a zahiri, don haka mu koma ga matsalarmu. Maimakon lamba mai sauƙi, yi tunanin zaren da ke wakiltar ƙasar wani a cikin teburin da ya gabata. Bari mu ce kuna da bishiyar da ta ƙunshi filin “ƙasa” (shafi na 3) na tebur:

  • Idan kuna son sanin wanda ke aiki a Burtaniya
  • kuna kallon bishiyar don samun kumburin da ke wakiltar Burtaniya
  • A cikin "UKnode" za ku sami wurin da bayanan ma'aikatan Burtaniya suke.

Wannan binciken zai kashe ayyukan log(N) maimakon ayyukan N idan kun yi amfani da tsararrun kai tsaye. Abinda kuka gabatar shine bayanan bayanai.

Kuna iya gina bishiyar index don kowane rukuni na filayen (kirtani, lamba, layi 2, lamba da kirtani, kwanan wata ...) muddin kuna da aikin kwatanta maɓallai (watau ƙungiyoyin filin) ​​don haka zaku iya saitawa. oda tsakanin makullin (wanda shine yanayin kowane nau'in asali a cikin bayanan).

B+TreeIndex

Yayin da wannan bishiyar ke aiki da kyau don samun takamaiman ƙima, akwai babbar matsala lokacin da kuke buƙata sami abubuwa da yawa tsakanin dabi'u biyu. Wannan zai kashe O(N) saboda dole ne ku kalli kowane kulli a cikin bishiyar kuma ku bincika ko yana tsakanin waɗannan dabi'u biyu (misali tare da umarnin bishiyar). Haka kuma, wannan aiki ba faifai I/O abokantaka bane tunda dole ne ka karanta dukkan bishiyar. Muna buƙatar nemo hanyar aiwatarwa da inganci roƙon iyaka. Don magance wannan matsala, rumbun adana bayanai na zamani suna amfani da gyare-gyaren bishiyar da ta gabata mai suna B+Tree. A cikin bishiyar B+:

  • kawai mafi ƙanƙanta nodes (ganye) adana bayanai (wurin layuka a cikin tebur mai alaƙa)
  • sauran nodes suna nan don yin kwatance zuwa madaidaicin kumburi a lokacin bincike.

Yadda Databases Aiki (Sashe na 1)

Kamar yadda kake gani, akwai ƙarin nodes a nan (sau biyu). Lallai, kuna da ƙarin nodes, "nodes yanke shawara", waɗanda za su taimaka muku nemo madaidaicin kumburi (wanda ke adana wurin layuka a cikin tebur mai alaƙa). Amma rikitaccen bincike har yanzu O(log(N)) (akwai ƙarin matakin guda ɗaya kawai). Babban bambancin shi ne nodes a ƙananan matakin suna da alaƙa da magadansu.

Tare da wannan B + Itace, idan kuna neman ƙimar tsakanin 40 da 100:

  • Kuna buƙatar kawai neman 40 (ko ƙimar mafi kusa bayan 40 idan 40 ba ya wanzu) kamar yadda kuka yi da itacen baya.
  • Sannan tara magada 40 ta hanyar amfani da hanyoyin haɗin kai kai tsaye har sai kun kai 100.

Bari mu ce kun sami M magaji kuma bishiyar tana da N nodes. Nemo takamaiman kundi na tsadar log(N) kamar itacen baya. Amma da zarar kun sami wannan kullin, zaku sami M magaji a cikin Ayyukan M tare da nassoshi ga magajin su. Wannan binciken yana biyan M+log(N) kawai ayyuka idan aka kwatanta da ayyukan N akan bishiyar da ta gabata. Bugu da ƙari, ba dole ba ne ka karanta cikakken bishiyar (M+log(N) nodes kawai), wanda ke nufin ƙarancin amfani da faifai. Idan M ƙarami ne (misali layuka 200) kuma N babba (layi 1), za a sami babban bambanci.

Amma akwai sababbin matsaloli a nan (sake!). Idan ka ƙara ko share jere a cikin bayanan (saboda haka a cikin alamar B+Tree mai alaƙa):

  • Dole ne ku kiyaye tsari tsakanin nodes a cikin bishiyar B+, in ba haka ba ba za ku iya samun nodes a cikin bishiyar da ba a ware ba.
  • Dole ne ku kiyaye mafi ƙarancin adadin matakan matakan a cikin B+Tree, in ba haka ba O(log(N)) wahalar lokaci ya zama O(N).

A wasu kalmomi, B+Tree dole ne ya kasance mai yin odar kai da daidaitawa. An yi sa'a, wannan yana yiwuwa tare da gogewa mai wayo da saka ayyuka. Amma wannan yana zuwa akan farashi: shigarwa da gogewa a cikin farashin bishiyar B+ O(log(N)). Shi ya sa wasunku suka ji haka yin amfani da fihirisa da yawa ba kyakkyawan ra'ayi ba ne. Hakika, kuna rage saurin sakawa/sabuntawa/ goge jere a cikin tebursaboda ma'ajin bayanai yana buƙatar sabunta ma'auni na tebur ta amfani da aiki mai tsada na O(log(N)) don kowace maƙasudi. Bugu da ƙari, ƙara fihirisa yana nufin ƙarin nauyin aiki don manajan ciniki (za a bayyana a karshen labarin).

Don ƙarin cikakkun bayanai, zaku iya ganin labarin Wikipedia akan B+Tree. Idan kuna son misalin aiwatar da Bishiyar B+ a cikin bayanan bayanai, duba wannan labarin и wannan labarin daga babban mai haɓaka MySQL. Dukansu suna mai da hankali kan yadda InnoDB (injin MySQL) ke sarrafa fihirisa.

Lura: Wani mai karatu ya gaya mani cewa, saboda ƙananan matakan ingantawa, ya kamata itacen B+ ya kasance daidai.

Hashtable

Tsarin bayanan mu na ƙarshe shine tebur zanta. Wannan yana da amfani sosai lokacin da kuke son bincika ƙima cikin sauri. Bugu da ƙari, fahimtar tebur ɗin zanta zai taimaka mana daga baya fahimtar tsarin haɗin yanar gizon gama gari wanda ake kira hash join ( shiga tsakani). Wannan tsarin bayanai kuma ana amfani da shi ta wurin adana bayanai don adana wasu abubuwa na ciki (misali. tebur kulle ko buffer pool, za mu ga waɗannan ra'ayoyin biyu daga baya).

Teburin zanta shine tsarin bayanai wanda ke saurin gano wani abu ta maɓalli. Don gina tebur ɗin hash kuna buƙatar ayyana:

  • alama don abubuwan ku
  • aikin hash don makullin. Hashes maɓalli da aka lissafta suna ba da wurin abubuwan abubuwan (wanda ake kira sassa ).
  • aiki don kwatanta maɓalli. Da zarar kun sami sashin daidai, dole ne ku nemo abin da kuke nema a cikin sashin ta amfani da wannan kwatancen.

Misali mai sauƙi

Bari mu dauki misali bayyananne:

Yadda Databases Aiki (Sashe na 1)

Wannan tebur na zanta yana da sassa 10. Saboda ni malalaci ne, sai na zana kashi 5 kawai, amma nasan kana da wayo, don haka zan baka damar hoton sauran 5 da kanka. Na yi amfani da aikin hash modulo 10 na maɓalli. A wasu kalmomi, Ina adana lambobi na ƙarshe kawai na maɓallin element ɗin don nemo sashinsa:

  • idan lamba ta ƙarshe ta kasance 0, ɓangaren ya faɗi zuwa kashi 0,
  • idan lamba ta ƙarshe ta kasance 1, ɓangaren ya faɗi zuwa kashi 1,
  • idan lamba ta ƙarshe ta kasance 2, sinadarin ya faɗi cikin yanki 2,
  • ...

Aikin kwatanta da na yi amfani da shi shine kawai daidaito tsakanin lamba biyu.

Bari mu ce kuna son samun kashi 78:

  • Teburin hash yana ƙididdige lambar zanta don 78, wanda shine 8.
  • Teburin zanta yana kallon sashi na 8, kuma kashi na farko da ya samo shine 78.
  • Ta mayar maka da abu na 78
  • Ayyukan nema 2 ne kacal (ɗayan don ƙididdige ƙimar hash, ɗayan kuma don duba abubuwan da ke cikin ɓangaren).

Yanzu bari mu ce kuna son samun kashi 59:

  • Teburin hash yana ƙididdige lambar zanta don 59, wanda shine 9.
  • Teburin hash yana bincike a kashi na 9, kashi na farko da aka samo shine 99. Tun da 99!=59, element 99 bashi da inganci.
  • Ta hanyar amfani da dabaru iri ɗaya, ana ɗaukar kashi na biyu (9), na uku (79), ..., na ƙarshe (29).
  • Ba a samo wani abu ba.
  • Binciken ya kashe ayyuka 7.

Kyakkyawan aikin zanta

Kamar yadda kuke gani, dangane da ƙimar da kuke nema, farashin ba ɗaya bane!

Idan yanzu na canza aikin hash modulo 1 na maɓalli (wato, ɗaukar lambobi 000 na ƙarshe), bincike na biyu kawai yana biyan aiki 000 tunda babu abubuwa a cikin kashi 6. Babban ƙalubalen shine samun aikin hash mai kyau wanda zai haifar da guga mai ɗauke da ƙananan adadin abubuwa.

A cikin misali na, gano kyakkyawan aikin hash yana da sauƙi. Amma wannan misali ne mai sauƙi, gano aikin hash mai kyau ya fi wahala idan maɓalli shine:

  • kirtani (misali - sunan karshe)
  • Layuka 2 (misali - sunan ƙarshe da sunan farko)
  • Layi 2 da kwanan wata (misali - sunan ƙarshe, sunan farko da ranar haihuwa)
  • ...

Tare da kyakkyawan aikin zanta, duba tebur ɗin hash farashin O(1).

Array vs tebur hash

Me yasa ba za a yi amfani da tsararru ba?

Hmm tambaya mai kyau.

  • Teburin zanta na iya zama an loda wani bangare cikin ƙwaƙwalwar ajiya, kuma sauran sassan na iya zama a kan faifai.
  • Tare da tsararru dole ne ka yi amfani da sarari mai jujjuyawa a ƙwaƙwalwar ajiya. Idan kana loda babban teburi yana da matukar wahala a sami isasshen sarari ci gaba.
  • Don tebur zanta, zaku iya zaɓar maɓallin da kuke so (misali, ƙasa da sunan ƙarshe na mutum).

Don ƙarin bayani, kuna iya karanta labarin game da JavaHashMap, wanda shine ingantaccen aiwatar da tebur na zanta; ba kwa buƙatar fahimtar Java don fahimtar ra'ayoyin da ke cikin wannan labarin.

source: www.habr.com

Add a comment