Fasalolin ƙira samfurin bayanai don NoSQL

Gabatarwar

Fasalolin ƙira samfurin bayanai don NoSQL "Dole ne ku yi gudu da sauri kamar yadda za ku iya don kawai ku zauna a wurin,
kuma don isa wani wuri, dole ne ku yi gudu aƙalla sau biyu da sauri!”
(c) Alice in Wonderland

A wani lokaci da ya wuce aka ce in yi lacca manazarta Kamfaninmu kan batun zayyana samfuran bayanai, saboda zama a kan ayyukan na dogon lokaci (wani lokaci na shekaru da yawa) muna rasa abin da ke faruwa a kusa da mu a duniyar fasahar IT. A cikin kamfaninmu (hakan ya faru) yawancin ayyuka ba sa amfani da bayanan NoSQL (aƙalla a yanzu), don haka a cikin lacca na na ba su kulawa ta musamman ta amfani da misalin HBase kuma na yi ƙoƙarin daidaitawa da gabatar da kayan ga waɗancan. wadanda ba su taba amfani da su ba sun yi aiki. Musamman, na kwatanta wasu fasalulluka na ƙirar ƙirar bayanai ta amfani da misalin da na karanta shekaru da yawa da suka gabata a cikin labarin "Gabatarwa ga HB ase Schema Design" na Amandeep Khurana. Lokacin nazarin misalai, na kwatanta zaɓuɓɓuka da yawa don magance matsala ɗaya don in fi dacewa da isar da manyan ra'ayoyin ga masu sauraro.

Kwanan nan, "ba don komai ba," na tambayi kaina wannan tambayar (tsawon karshen mako na Mayu a cikin keɓe ya fi dacewa da wannan), nawa lissafin ka'idar zai dace da aiki? A gaskiya, wannan shine yadda aka haifi ra'ayin wannan labarin. Mai haɓakawa wanda ke aiki tare da NoSQL na kwanaki da yawa bazai koyi sabon abu daga gare ta ba (saboda haka yana iya tsallake rabin labarin nan da nan). Amma don manazartaGa waɗanda har yanzu ba su yi aiki tare da NoSQL ba, ina tsammanin zai zama da amfani don samun fahimtar asali game da fasalulluka na zayyana samfuran bayanai don HBase.

Misali bincike

A ganina, kafin ku fara amfani da bayanan NoSQL, kuna buƙatar yin tunani a hankali kuma ku auna fa'ida da rashin amfani. Yawancin lokaci ana iya magance matsalar ta amfani da DBMSs na al'ada. Saboda haka, yana da kyau kada a yi amfani da NoSQL ba tare da wasu dalilai masu mahimmanci ba. Idan duk da haka kuka yanke shawarar amfani da bayanan NoSQL, to yakamata kuyi la'akari da cewa hanyoyin ƙirar anan sun ɗan bambanta. Musamman wasu daga cikinsu na iya zama sabon abu ga waɗanda a baya suka yi mu'amala da DBMSs kawai (bisa ga abin lura na). Don haka, a cikin duniyar "dangantaka", yawanci muna farawa ta hanyar yin ƙirar yankin matsala, sannan kawai, idan ya cancanta, ƙirƙira ƙirar. A cikin NoSQL mu ya kamata nan da nan yin la'akari da yanayin da ake tsammani don aiki tare da bayanai kuma da farko sun karyata bayanan. Bugu da ƙari, akwai wasu bambance-bambance masu yawa, waɗanda za a tattauna a kasa.

Bari mu yi la'akari da matsalar "synthetic" mai zuwa, wanda za mu ci gaba da aiki da ita:

Wajibi ne a tsara tsarin ajiya don jerin abokai na masu amfani da wasu hanyoyin sadarwar zamantakewa. Don sauƙaƙa, za mu ɗauka cewa duk haɗin gwiwarmu ana sarrafa su (kamar a kan Instagram, ba Linkedin ba). Tsarin ya kamata ya ba ku damar yadda ya kamata:

  • Amsa tambayar ko mai amfani A ya karanta mai amfani B (tsarin karatu)
  • Ba da izinin ƙarawa/cire haɗin haɗi idan akwai biyan kuɗi/cire biyan kuɗin mai amfani A daga mai amfani B (samfurin canjin bayanai)

Tabbas, akwai zaɓuɓɓuka da yawa don magance matsalar. A cikin bayanan alaƙa na yau da kullun, da alama za mu iya yin tebur na alaƙa kawai (wataƙila an kwatanta idan, alal misali, muna buƙatar adana rukunin masu amfani: dangi, aiki, da sauransu, waɗanda suka haɗa da wannan “aboki”), kuma don haɓakawa. Saurin shiga zai ƙara fihirisa/bangare. Mafi mahimmanci tebur na ƙarshe zai yi kama da wani abu kamar haka:

mai amfani_id
aboki_id

Vasya
Peter

Vasya
Olya

daga nan, don bayyananniyar fahimta da kyakkyawar fahimta, zan nuna sunaye maimakon ID

Game da HBase, mun san cewa:

  • ingantaccen bincike wanda baya haifar da cikakken sikanin tebur yana yiwuwa na musamman ta maɓalli
    • a gaskiya, shi ya sa rubuta SQL queries saba wa mutane da yawa zuwa ga irin wannan databases ne mummunan ra'ayi; a zahiri, ba shakka, zaku iya aika tambayar SQL tare da Joins da sauran dabaru zuwa HBase daga Impala iri ɗaya, amma yaya tasirin zai kasance...

Saboda haka, an tilasta mana yin amfani da ID na mai amfani azaman maɓalli. Kuma tunanina na farko akan batun "a ina kuma yadda ake adana ID na abokai?" watakila ra'ayin adana su a cikin ginshiƙai. Wannan zaɓi mafi bayyananne kuma "rashin hankali" zai yi kama da wannan (bari mu kira shi Zabin 1 (tsoho)don ƙarin bayani):

RowKey
Masu iya magana

Vasya
1: Petya
2: alwala
3: dasa

Peter
1: masha
2: Wasa

Anan, kowane layi yayi daidai da mai amfani da hanyar sadarwa. Rukunin suna da sunaye: 1, 2, ... - bisa ga adadin abokai, kuma ana adana ID na abokai a cikin ginshiƙan. Yana da mahimmanci a lura cewa kowane jere zai sami adadin ginshiƙai daban-daban. A cikin misalin da ke sama, jeri ɗaya yana da ginshiƙai uku (1, 2 da 3), na biyu kuma yana da biyu kawai (1 da 2) - a nan mu da kanmu mun yi amfani da kaddarorin HBase guda biyu waɗanda ke da alaƙa da bayanan bayanai:

  • ikon canza abubuwan ginshiƙai (ƙara aboki -> ƙara shafi, cire aboki -> share shafi)
  • layuka daban-daban na iya samun nau'ikan ginshiƙi daban-daban

Mu duba tsarin mu don biyan buƙatun aikin:

  • Bayanan karatu: don fahimtar ko Vasya yana biyan kuɗin Olya, za mu buƙaci cirewa layin duka ta maɓalli RowKey = "Vasya" kuma raba ta cikin ƙimar ginshiƙi har sai mun "samu" Olya a cikinsu. Ko maimaita ta cikin ƙimar duk ginshiƙai, “ba saduwa” Olya kuma mayar da amsar Karya;
  • Gyara bayanai: ƙara aboki: don irin wannan aiki kuma muna buƙatar cirewa layin duka ta amfani da maɓallin RowKey = "Vasya" don ƙididdige adadin abokansa. Muna buƙatar wannan jimlar adadin abokai don sanin adadin ginshiƙi da muke buƙatar rubuta ID na sabon aboki a ciki.
  • Canza bayanai: share aboki:
    • Bukatar cirewa layin duka ta maɓalli na RowKey = "Vasya" kuma a tsara ta cikin ginshiƙai don nemo wanda aka rubuta abokin da za a share;
    • Bayan haka, bayan share abokinmu, muna buƙatar "canza" duk bayanan zuwa shafi ɗaya don kada mu sami "gizo" a cikin lambar su.

Yanzu bari mu kimanta yadda amfanin waɗannan algorithms, waɗanda za mu buƙaci aiwatarwa a gefen “ aikace-aikacen sharadi ”, za su kasance, ta amfani da O-alama. Bari mu nuna girman shafin yanar gizon mu na hasashe kamar n. Sannan iyakar adadin abokai daya mai amfani zai iya samu shine (n-1). Za mu iya ƙara yin watsi da wannan (-1) don dalilanmu, tun da yake a cikin tsarin amfani da O-alamomi ba shi da mahimmanci.

  • Bayanan karatu: Wajibi ne a cire dukkan layin kuma a sake maimaita duk ginshiƙanta a cikin iyaka. Wannan yana nufin ƙima mafi girma na farashi zai zama kusan O(n)
  • Gyara bayanai: ƙara aboki: don tantance adadin abokai, kuna buƙatar sake maimaita duk ginshiƙan jere, sannan saka sabon shafi => O(n)
  • Canza bayanai: share aboki:
    • Kama da ƙara - kuna buƙatar shiga cikin duk ginshiƙai a cikin iyaka => O (n)
    • Bayan cire ginshiƙan, muna buƙatar "motsa" su. Idan kun aiwatar da wannan "kai-da-kai", to, a cikin iyaka za ku buƙaci aiki har zuwa (n-1). Amma a nan da kuma gaba a cikin m bangare za mu yi amfani da wata hanya dabam, wanda zai aiwatar da "pseudo-shift" ga wani ƙayyadadden adadin ayyuka - wato, akai-akai lokaci a kan shi, ko da kuwa n. Wannan lokacin akai-akai (O(2) don zama daidai) ana iya yin watsi da shi idan aka kwatanta da O(n). An kwatanta hanyar da za a bi a cikin hoton da ke ƙasa: kawai muna kwafin bayanai daga shafi na "ƙarshe" zuwa wanda muke so mu goge bayanan daga ciki, sannan mu goge ginshiƙi na ƙarshe:
      Fasalolin ƙira samfurin bayanai don NoSQL

Gabaɗaya, a cikin kowane yanayi mun sami rikitarwa mai rikitarwa na O(n).
Wataƙila kun riga kun lura cewa kusan koyaushe dole ne mu karanta jeri na gaba ɗaya daga ma'ajin bayanai, kuma a lokuta biyu cikin uku, kawai don shiga cikin dukkan ginshiƙan mu lissafta adadin abokai. Saboda haka, a matsayin ƙoƙari na ingantawa, za ka iya ƙara ginshiƙi "ƙidaya", wanda ke adana jimillar abokan kowane mai amfani da hanyar sadarwa. A wannan yanayin, ba za mu iya karanta dukan jeri don lissafta jimillar abokai, amma karanta daya kawai "ƙidaya" shafi. Babban abu shine kar a manta da sabunta "ƙidaya" lokacin sarrafa bayanai. Wannan. muna samun ingantawa Zabin 2 (ƙidaya):

RowKey
Masu iya magana

Vasya
1: Petya
2: alwala
3: dasa
kiwo: 3

Peter
1: masha
2: Wasa

kiwo: 2

Idan aka kwatanta da zaɓi na farko:

  • Bayanan karatu: don samun amsar tambayar "Shin Vasya ya karanta Olya?" babu abin da ya canza => O(n)
  • Gyara bayanai: ƙara aboki: Mun sauƙaƙa shigar da sabon aboki, tun da yanzu ba mu buƙatar karanta dukan layi da kuma maimaita kan ginshiƙan sa, amma kawai za mu iya samun darajar shafi na "ƙidaya", da dai sauransu. nan da nan ƙayyade lambar shafi don saka sabon aboki. Wannan yana haifar da raguwa a cikin hadaddun lissafi zuwa O(1)
  • Canza bayanai: share aboki: Lokacin share aboki, za mu iya amfani da wannan shafi don rage yawan ayyukan I/O lokacin "canza" bayanan tantanin halitta zuwa hagu. Amma buƙatar sake maimaita ta cikin ginshiƙai don nemo wanda yake buƙatar sharewa har yanzu ya rage, don haka => ​​O(n)
  • A gefe guda, yanzu lokacin sabunta bayanai muna buƙatar sabunta shafin "ƙidaya" kowane lokaci, amma wannan yana ɗaukar lokaci akai-akai, wanda za'a iya yin watsi da shi a cikin tsarin O-alamomi.

Gabaɗaya, zaɓi na 2 yana da ɗan ƙara kyau, amma yana kama da "juyin halitta maimakon juyin juya hali." Don yin "juyin juya hali" za mu buƙaci Zabin 3 (col).
Bari mu juya komai "juye": za mu sanya ID mai amfani sunan shafi! Abin da za a rubuta a cikin ginshiƙan kanta ba shi da mahimmanci a gare mu, bari ya zama lamba 1 (gaba ɗaya, ana iya adana abubuwa masu amfani a can, misali, ƙungiyar "iyali / abokai / da dai sauransu."). Wannan hanya na iya ba da mamaki ga "layi" mara shiri wanda ba shi da kwarewa ta baya aiki tare da bayanan NoSQL, amma daidai wannan hanya ce ta ba ka damar amfani da yuwuwar HBase a cikin wannan aikin sosai yadda ya kamata:

RowKey
Masu iya magana

Vasya
Petya: 1
Alwala: 1
Daga: 1

Peter
Masha: 1
Wasa: 1

Anan muna samun fa'idodi da yawa lokaci guda. Don fahimtar su, bari mu bincika sabon tsarin kuma mu ƙididdige ƙididdiga masu rikitarwa:

  • Bayanan karatu: don amsa tambayar ko Vasya yana biyan kuɗin Olya, ya isa ya karanta shafi ɗaya "Olya": idan akwai, to amsar ita ce Gaskiya, idan ba haka ba - Ƙarya => O (1)
  • Gyara bayanai: ƙara abokiƘara aboki: kawai ƙara sabon shafi "ID ɗin Aboki" => O (1)
  • Canza bayanai: share aboki: kawai cire ginshiƙin ID na Aboki => O (1)

Kamar yadda kake gani, babban fa'idar wannan ƙirar ajiyar ita ce, a cikin duk yanayin da muke buƙata, muna aiki tare da shafi ɗaya kawai, muna guje wa karanta gabaɗayan jere daga ma'ajin bayanai kuma, ƙari kuma, ƙididdige duk ginshiƙan wannan jeri. Za mu iya tsayawa a nan, amma ...

Kuna iya yin mamaki kuma ku ci gaba kaɗan tare da hanyar inganta aiki da rage ayyukan I/O lokacin samun damar bayanai. Idan muka adana cikakken bayanin dangantakar kai tsaye a cikin maɓalli na jere fa? Wato, sanya maɓalli mai haɗawa kamar userID.friendID? A wannan yanayin, ba ma sai mun karanta ginshiƙan layin kwata-kwata (Zabin 4 (jere)):

RowKey
Masu iya magana

Vasya.Petya
Petya: 1

Vasya.Olya
Alwala: 1

Vasya.Dasha
Daga: 1

Petya.Masha
Masha: 1

Petya.Vasya
Wasa: 1

Babu shakka, kima na duk yanayin sarrafa bayanai a cikin irin wannan tsari, kamar yadda yake a cikin sigar baya, zai zama O(1). Bambanci tare da zaɓi na 3 zai kasance kawai a cikin ingancin ayyukan I/O a cikin bayanan.

To, "bakan" na ƙarshe. Yana da sauƙi a ga cewa a cikin zaɓi na 4, maɓallin jere zai sami tsayi mai canzawa, wanda zai yiwu ya shafi aiki (a nan muna tuna cewa HBase yana adana bayanai azaman saitin bytes da layuka a cikin tebur ana jerawa ta maɓalli). Bugu da kari muna da mai raba wanda zai iya buƙatar a sarrafa shi a wasu yanayi. Don kawar da wannan tasirin, zaku iya amfani da hashes daga userID da friendID, kuma tunda duka hashes ɗin zasu sami tsayin tsayi, zaku iya haɗa su kawai, ba tare da mai raba su ba. Sannan bayanan dake cikin tebur zasu yi kama da haka (Zabin 5 (hash)):

RowKey
Masu iya magana

dc084ef00e94aef49be885f9b01f51c01918fa783851db0dc1f72f83d33a5994
Petya: 1

dc084ef00e94aef49be885f9b01f51c0f06b7714b5ba522c3cf51328b66fe28a
Alwala: 1

dc084ef00e94aef49be885f9b01f51c00d2c2e5d69df6b238754f650d56c896a
Daga: 1

1918fa783851db0dc1f72f83d33a59949ee3309645bd2c0775899fca14f311e1
Masha: 1

1918fa783851db0dc1f72f83d33a5994dc084ef00e94aef49be885f9b01f51c0
Wasa: 1

Babu shakka, ƙayyadaddun algorithmic na aiki tare da irin wannan tsari a cikin al'amuran da muke la'akari zai zama daidai da na zaɓi na 4 - wato, O(1).
Gabaɗaya, bari mu taƙaita duk ƙididdigarmu na rikitarwar lissafi a cikin tebur ɗaya:

Ƙara aboki
Dubawa kan aboki
Cire aboki

Zabin 1 (tsoho)
Yã (n)
Yã (n)
Yã (n)

Zabin 2 (ƙidaya)
O (1)
Yã (n)
Yã (n)

Zabin 3 (shafi)
O (1)
O (1)
O (1)

Zabin 4 (jere)
O (1)
O (1)
O (1)

Zabin 5 (hash)
O (1)
O (1)
O (1)

Kamar yadda kake gani, zaɓuɓɓukan 3-5 suna da alama sun fi fifiko kuma a ƙa'idar tabbatar da aiwatar da duk abubuwan da suka dace na magudin bayanai a cikin lokaci akai-akai. A cikin yanayin aikinmu, babu wani takamaiman buƙatu don samun jerin sunayen abokan masu amfani, amma a cikin ayyukan aikin na gaske, zai yi kyau a gare mu, a matsayin manazarta masu kyau, mu “tsammaci” cewa irin wannan aikin na iya tasowa kuma "barka wani bambaro." Sabili da haka, tausayi na yana gefen zaɓi na 3. Amma yana da kusan cewa a cikin ainihin aikin wannan buƙatar za a iya riga an warware shi ta wasu hanyoyi, sabili da haka, ba tare da hangen nesa na gaba ɗaya ba, yana da kyau kada a yi. ƙarshe ƙarshe.

Shiri na gwaji

Ina so in gwada abubuwan da ke sama a zahiri a aikace - wannan shine makasudin ra'ayin da ya taso a cikin dogon karshen mako. Don yin wannan, yana da mahimmanci don kimanta saurin aiki na "aikace-aikacen sharadi" a cikin duk yanayin da aka bayyana don amfani da bayanan bayanai, da kuma karuwa a wannan lokacin tare da karuwar girman hanyar sadarwar zamantakewa (n). Maƙasudin maƙasudin da ke ba mu sha'awar kuma wanda za mu auna yayin gwajin shine lokacin da " aikace-aikacen sharaɗi " ya kashe don yin "aikin kasuwanci". Ta "ma'amalar kasuwanci" muna nufin ɗaya daga cikin masu zuwa:

  • Ƙara sabon aboki ɗaya
  • Dubawa idan Mai amfani A abokin mai amfani ne B
  • Cire aboki ɗaya

Don haka, la'akari da buƙatun da aka zayyana a cikin bayanin farko, yanayin tabbatarwa ya fito kamar haka:

  • Rikodin bayanai. Ƙirƙirar cibiyar sadarwa ta farko na girman n. Don kusanci zuwa "duniya ta gaske", adadin abokai kowane mai amfani yana da madaidaicin bazuwar. Auna lokacin da “aiki na sharadi” namu ke rubuta duk bayanan da aka samar zuwa HBase. Sa'an nan kuma raba sakamakon lokacin da jimlar adadin abokai - wannan shine yadda muke samun matsakaicin lokacin "aikin kasuwanci" ɗaya.
  • Bayanan karatu. Ga kowane mai amfani, ƙirƙiri jerin “halayen mutum” waɗanda kuke buƙatar samun amsa ko an yi musu rajista ko a'a. Tsawon lissafin = kusan adadin abokan mai amfani, kuma ga rabin abokan da aka bincika amsar ya kamata "Ee", da sauran rabin - "A'a". Ana yin rajistan ne a cikin tsari don amsa “Ee” da “A’a” (wato a kowane yanayi na biyu dole ne mu bi duk ginshiƙan layin don zaɓi na 1 da 2). Ana raba jimlar lokacin tantancewa zuwa adadin abokai da aka gwada don samun matsakaicin lokacin tantancewa kowane darasi.
  • Share bayanai. Cire duk abokai daga mai amfani. Bugu da ƙari, odar gogewa ba ta dace ba (wato, muna “shuffle” ainihin jerin abubuwan da aka yi amfani da su don yin rikodin bayanai). Ana raba jimlar lokacin rajistan zuwa adadin abokai da aka cire don samun matsakaicin lokacin kowane cak.

Ana buƙatar gudanar da al'amuran don kowane zaɓin samfurin bayanai na 5 da kuma girman daban-daban na hanyar sadarwar zamantakewa don ganin yadda lokaci ke canzawa yayin da yake girma. A cikin n guda ɗaya, haɗi a cikin hanyar sadarwa da jerin masu amfani don bincika dole ne, ba shakka, su kasance iri ɗaya ga duk zaɓuɓɓuka 5.
Don ƙarin fahimta, a ƙasa akwai misalin bayanan da aka ƙirƙira don n= 5. Rubuce-rubucen “janeneta” yana samar da ƙamus na ID guda uku azaman fitarwa:

  • na farko shi ne na shigarwa
  • na biyu shine don dubawa
  • na uku – don shafewa

{0: [1], 1: [4, 5, 3, 2, 1], 2: [1, 2], 3: [2, 4, 1, 5, 3], 4: [2, 1]} # всего 15 друзей

{0: [1, 10800], 1: [5, 10800, 2, 10801, 4, 10802], 2: [1, 10800], 3: [3, 10800, 1, 10801, 5, 10802], 4: [2, 10800]} # всего 18 проверяемых субъектов

{0: [1], 1: [1, 3, 2, 5, 4], 2: [1, 2], 3: [4, 1, 2, 3, 5], 4: [1, 2]} # всего 15 друзей

Kamar yadda kake gani, duk ID ɗin da ya fi 10 a cikin ƙamus don bincika ainihin waɗanda za su ba da amsar Ƙarya ne. Sakawa, dubawa da share "abokai" ana aiwatar dasu daidai a cikin jerin da aka ƙayyade a cikin ƙamus.

An yi gwajin ne akan kwamfutar tafi-da-gidanka da ke aiki da Windows 10, inda HBase ke gudana a cikin akwati guda ɗaya na Docker, kuma Python mai Jupyter Notebook yana gudana a ɗayan. Docker an ware 2 CPU cores da 2 GB na RAM. Duk dabaru, kamar kwaikwayi na “aiki na sharadi” da “bututu” don samar da bayanan gwaji da lokacin aunawa, an rubuta su cikin Python. An yi amfani da ɗakin karatu don aiki tare da HBase farin cikibase, don lissafin hashes (MD5) don zaɓi na 5 - hashlib

Yin la'akari da ikon sarrafa kwamfuta na takamaiman kwamfutar tafi-da-gidanka, ƙaddamar da n = 10, 30,… an zaɓi ta gwaji. 170 - lokacin da jimlar lokacin aiki na cikakken gwajin sake zagayowar (duk yanayin yanayin ga duk zaɓuɓɓuka don duk n) ya kasance ma fiye ko žasa da ma'ana kuma ya dace yayin taron shayi ɗaya (a matsakaicin mintuna 15).

Anan ya zama dole a yi tsokaci cewa a cikin wannan gwaji ba mu da farko kimanta cikakkun alkaluman ayyuka ba. Ko da kwatancen dangi na zaɓuɓɓuka biyu daban-daban na iya zama ba daidai ba. Yanzu muna sha'awar yanayin canjin lokaci dangane da n, tun da la'akari da tsarin da ke sama na "tsayin gwaji", yana da matukar wahala a sami kimar lokaci "barrantar" tasirin bazuwar da sauran dalilai ( kuma ba a saita irin wannan aikin ba).

Sakamakon gwaji

Gwajin farko shine yadda lokacin da aka kashe cika jerin abokai ke canzawa. Sakamakon yana cikin jadawali da ke ƙasa.
Fasalolin ƙira samfurin bayanai don NoSQL
Zaɓuɓɓuka 3-5, kamar yadda aka sa ran, suna nuna kusan lokaci na "ma'amalar kasuwanci", wanda ba ya dogara da girman girman cibiyar sadarwa da bambancin da ba a iya ganewa a cikin aikin.
Zabin 2 kuma yana nuna ci gaba, amma ɗan ƙaramin aiki mafi muni, kusan sau 2 daidai dangane da zaɓuɓɓuka 3-5. Kuma wannan ba zai iya yin farin ciki ba, tunda yana da alaƙa da ka'idar - a cikin wannan sigar adadin ayyukan I/O zuwa/daga HBase ya fi sau 2 daidai. Wannan na iya zama shaida kai tsaye cewa bencin gwajin mu, bisa ƙa'ida, yana ba da daidaito mai kyau.
Zaɓin 1 kuma, kamar yadda aka zata, ya zama mafi sauƙi kuma yana nuna haɓakar layi a cikin lokacin da aka kashe akan ƙara juna zuwa girman cibiyar sadarwa.
Yanzu bari mu kalli sakamakon gwaji na biyu.
Fasalolin ƙira samfurin bayanai don NoSQL
Zaɓuɓɓuka 3-5 suna sake yin aiki kamar yadda aka zata - lokaci akai-akai, mai zaman kansa daga girman cibiyar sadarwa. Zaɓuɓɓuka 1 da 2 suna nuna haɓakar layi a cikin lokaci yayin da girman cibiyar sadarwa ke ƙaruwa da aiki makamancin haka. Bugu da ƙari, zaɓi na 2 ya zama ɗan hankali - a fili saboda buƙatar sake karantawa da aiwatar da ƙarin ginshiƙi "ƙidaya", wanda ya zama sananne yayin da n girma. Amma har yanzu zan dena yanke duk wata matsaya, tunda daidaiton wannan kwatancen yana da ƙasa kaɗan. Bugu da ƙari, waɗannan ƙididdiga (wanda zaɓi, 1 ko 2, ya fi sauri) ya canza daga gudu zuwa gudu (yayin da yake kula da yanayin dogara da "tafi wuyansa da wuyansa").

To, jadawali na ƙarshe shine sakamakon gwajin cirewa.

Fasalolin ƙira samfurin bayanai don NoSQL

Bugu da kari, babu mamaki a nan. Zaɓuɓɓuka 3-5 suna aiwatar da cirewa a cikin lokaci akai-akai.
Bugu da ƙari, abin sha'awa, zaɓuɓɓukan 4 da 5, ba kamar al'amuran da suka gabata ba, suna nuna ɗan ƙaramin aiki mafi muni fiye da zaɓi na 3. A bayyane yake, aikin shafewar layi ya fi tsada fiye da aikin gogewar shafi, wanda ke da ma'ana gabaɗaya.

Zaɓuɓɓuka na 1 da 2, kamar yadda ake tsammani, suna nuna haɓakar layin layi a cikin lokaci. A lokaci guda, zaɓi na 2 yana da sannu a hankali fiye da zaɓi na 1 - saboda ƙarin aikin I/O don "riƙe" ginshiƙin ƙidayar.

Gabaɗaya ƙarshen gwajin:

  • Zaɓuɓɓuka 3-5 suna nuna ingantaccen aiki yayin da suke cin gajiyar HBase; Bugu da ƙari, aikin su ya bambanta dangane da juna ta hanyar akai-akai kuma baya dogara da girman cibiyar sadarwa.
  • Ba a yi rikodin bambanci tsakanin zaɓuɓɓuka 4 da 5 ba. Amma wannan ba yana nufin kada a yi amfani da zaɓi na 5 ba. Wataƙila yanayin gwajin da aka yi amfani da shi, la'akari da halayen aikin benci na gwaji, bai bari a gano shi ba.
  • Yanayin haɓaka a cikin lokacin da ake buƙata don aiwatar da "ayyukan kasuwanci" tare da bayanai gabaɗaya sun tabbatar da ƙididdige ƙididdigewa da aka samu a baya don duk zaɓuɓɓuka.

Epilogue

Gwaje-gwaje masu tsauri da aka yi bai kamata a dauki cikakkiyar gaskiya ba. Akwai abubuwa da yawa waɗanda ba a yi la'akari da su ba kuma sun gurbata sakamakon (waɗannan sauye-sauye suna bayyane musamman a cikin jadawali tare da ƙaramin girman cibiyar sadarwa). Misali, saurin thrift, wanda happybase ke amfani dashi, girma da kuma hanyar aiwatar da dabaru da na rubuta a Python (ba zan iya da'awar cewa an rubuta lambar da kyau ba kuma ta yi amfani da damar dukkan abubuwan da suka dace), watakila. fasalulluka na caching HBase, aikin bango na Windows 10 akan kwamfutar tafi-da-gidanka, da sauransu. Gabaɗaya, zamu iya ɗauka cewa duk ƙididdige ƙididdiga sun gwada ingancin su. To, ko akalla ba zai yiwu a karyata su da irin wannan "kai-kai" ba.

A ƙarshe, shawarwari ga duk wanda ke fara ƙira ƙirar bayanai a cikin HBase: ƙayyadaddun ƙwarewar da ta gabata ta yin aiki tare da bayanan bayanai kuma ku tuna “umarni”:

  • Lokacin zayyana, muna ci gaba daga aiki da tsarin sarrafa bayanai, kuma ba daga ƙirar yanki ba
  • Ingantacciyar hanya (ba tare da cikakken sikanin tebur ba) - ta maɓalli kawai
  • Denormalization
  • Layuka daban-daban na iya ƙunsar ginshiƙai daban-daban
  • Haɗaɗɗen masu magana

source: www.habr.com

Add a comment