Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase

Mholo Igama lam nguDanil Lipovoy, iqela lethu e-Sbertech laqala ukusebenzisa i-HBase njengendawo yokugcina idatha yokusebenza. Ngethuba lokufunda, amava aqokelele ukuba ndifuna ukucwangcisa kunye nokuchaza (sinethemba lokuba kuya kuba luncedo kwabaninzi). Zonke iimvavanyo ezingezantsi zenziwa ngeenguqulelo ze-HBase 1.2.0-cdh5.14.2 kunye ne-2.0.0-cdh6.0.0-beta1.

  1. Uyilo lwezakhiwo ngokubanzi
  2. Ukubhala idatha kwi-HBASE
  3. Ukufunda idatha kwi-HBASE
  4. Ugcino lwedatha
  5. Ukusetyenzwa kwedatha yebhetshi iMultiGet/MultiPut
  6. Ubuchule bokwahlulahlula iitafile ngokwemimandla (ukwahlula)
  7. Ukunyamezela iimpazamo, ukudibanisa kunye nendawo yedatha
  8. Izicwangciso kunye nokusebenza
  9. Uvavanyo loxinzelelo
  10. ezifunyanisiweyo

1. Uyilo jikelele

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
I-Master backup imamela ukubetha kwentliziyo yalowo usebenzayo kwindawo yeZooKeeper kwaye, kwimeko yokunyamalala, ithatha imisebenzi yenkosi.

2. Bhala idatha kwi-HBASE

Okokuqala, makhe sijonge owona mzekelo ulula - ukubhala into yexabiso elingundoqo kwitafile usebenzisa i- put(rowkey). Umxhasi kufuneka aqale afumanise apho i-Root Region Server (RRS), egcina i-hbase:meta table, ikhona. Ufumana olu lwazi kwiZooKeeper. Emva koko ifikelela kwi-RRS kwaye ifunde i-hbase: itheyibhile yemeta, apho ikhupha khona ulwazi malunga nokuba yeyiphi i-RegionServer (RS) inoxanduva lokugcina idatha ye-rowkey enikiweyo kwitheyibhile yomdla. Ukusetyenziswa kwixesha elizayo, itafile yemeta igcinwe ngumthengi kwaye ngoko ke iifowuni ezilandelayo zihamba ngokukhawuleza, ngokuthe ngqo kwi-RS.

Okulandelayo, i-RS, ifumene isicelo, okokuqala ibhalela i-WriteAheadLog (WAL), eyimfuneko yokubuyisela kwimeko yengozi. Emva koko igcina idatha kwiMemStore. Esi sisithinteli kwinkumbulo equlathe isethi ehleliweyo yezitshixo zommandla onikiweyo. Itheyibhile inokwahlulwa ibe yimimandla (izahlulo), nganye kuzo iqulethe isethi edibeneyo yezitshixo. Oku kukuvumela ukuba ubeke imimandla kwiiseva ezahlukeneyo ukufezekisa ukusebenza okuphezulu. Nangona kunjalo, ngaphandle kokucaca kwale nkcazo, siya kubona kamva ukuba oku akusebenzi kuzo zonke iimeko.

Emva kokubeka ingeniso kwiMemStore, impendulo ibuyiselwa kumxhasi ukuba ukungena kugcinwe ngempumelelo. Nangona kunjalo, eneneni igcinwa kuphela kwi-buffer kwaye ifika kwidiski kuphela emva kokuba ixesha elithile lidlulile okanye xa lizaliswe idatha entsha.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Xa usenza umsebenzi othi "Cima", idatha ayicinywanga ngokwasemzimbeni. Ziphawulwa ngokulula njengezicinyiweyo, kwaye intshabalalo ngokwayo yenzeka ngexesha lokubiza umsebenzi omkhulu ohlangeneyo, ochazwe ngokweenkcukacha ngakumbi kumhlathi wesi-7.

Iifayile kwifomathi ye-HFile ziqokelelwa kwi-HDFS kwaye amaxesha ngamaxesha inkqubo encinci ye-compact iyasungulwa, edibanisa ngokulula iifayile ezincinci zibe ezinkulu ngaphandle kokucima nantoni na. Ngokuhamba kwexesha, oku kujika kube yingxaki evela kuphela xa kufundwa idatha (siya kubuyela kule kamva kancinane).

Ukongeza kwinkqubo yokulayisha echazwe ngasentla, kukho inkqubo esebenzayo kakhulu, mhlawumbi eyona nto inamandla yale database - BulkLoad. Ilele kwinto yokuba sizenzela ngokuzimeleyo ii-HFiles kwaye sizibeke kwidiski, esivumela ukuba silinganise ngokugqibeleleyo kwaye sifezekise izantya ezindilisekileyo. Ngapha koko, umda apha awuyiyo i-HBase, kodwa amandla e-hardware. Ngezantsi ziziphumo ze-boot kwi-cluster equkethe i-16 RegionServers kunye ne-16 NodeManager YARN (CPU Xeon E5-2680 v4 @ 2.40GHz * 64 threads), inguqulo ye-HBase 1.2.0-cdh5.14.2.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase

Apha unokubona ukuba ngokunyusa inani lezahlulo (imimandla) kwitheyibhile, kunye nababulali be-Spark, sifumana ukwanda kwesantya sokukhuphela. Kwakhona, isantya sixhomekeke kwivolumu yokurekhoda. Iibhloko ezinkulu zinika ukwanda kwe-MB / isekhondi, iibhloko ezincinci kwinani leerekhodi ezifakiweyo kwiyunithi yexesha, zonke ezinye izinto zilingana.

Ungaqalisa kwakhona ukulayisha kwiitafile ezimbini ngexesha elinye kwaye ufumane isantya esiphindwe kabini. Ngezantsi ungabona ukuba ukubhala iibhloko ze-10 KB kwiitafile ezimbini ngexesha elinye kwenzeka ngesantya esimalunga ne-600 MB/sec kwindawo nganye (i-1275 MB/sec iyonke), ehambelana nesantya sokubhala kwitafile enye 623 MB/sec (bona No. 11 ngasentla)

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Kodwa okwesibini ukubaleka kunye neerekhodi ze-50 KB kubonisa ukuba isantya sokukhuphela sikhula kancane, nto leyo ebonisa ukuba isondela kumaxabiso alinganiselwe. Kwangaxeshanye, kufuneka ukhumbule ukuba akukho mthwalo owenziweyo kwi-HBASE ngokwayo, efunekayo kuyo kuqala ukunika idatha esuka kwi-hbase:meta, kwaye emva kokufakwa kwe-HFiles, seta kwakhona idatha yeBlockCache kwaye ugcine MemStore buffer kwidiski, ukuba ayinanto.

3. Ukufunda idatha kwi-HBASE

Ukuba sicinga ukuba umxhasi sele enalo lonke ulwazi oluvela kwi-hbase: meta (jonga inqaku lesi-2), ngoko isicelo siya ngqo kwi-RS apho isitshixo esifunekayo sigcinwe khona. Okokuqala, ukukhangela kwenziwa kwiMemCache. Kungakhathaliseki ukuba kukho idatha apho okanye ayikho, ukukhangela kuqhutyelwa kwakhona kwi-BlockCache buffer kwaye, ukuba kuyimfuneko, kwi-HFiles. Ukuba idatha ifunyenwe kwifayile, ifakwe kwi-BlockCache kwaye iya kubuyiselwa ngokukhawuleza kwisicelo esilandelayo. Ukukhangela kwi-HFile kukhawuleza kubulela ekusebenziseni isihluzo seBloom, okt. emva kokufunda isixa esincinci sedatha, igqiba ngoko nangoko ukuba le fayile iqulathe isitshixo esifunekayo kwaye ukuba akunjalo, ngoku idlulela kwelilandelayo.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Emva kokufumana idatha kule mithombo mithathu, i-RS ivelisa impendulo. Ngokukodwa, inokudlulisa iinguqulelo ezininzi ezifunyenweyo zento kanye ukuba umxhasi ucele uguqulelo.

4. Ukugcinwa kwedatha

I-MemStore kunye ne-BlockCache buffers zihlala ukuya kuthi ga kwi-80% yememori ye-RS eyabelwe kwi-heap (ezinye zigcinelwe imisebenzi yenkonzo ye-RS). Ukuba indlela yokusetyenziswa eqhelekileyo kukuba iinkqubo zibhala kwaye ngokukhawuleza zifunde idatha efanayo, ngoko kunengqiqo ukunciphisa iBlockCache kunye nokwandisa iMemStore, kuba Xa idatha yokubhala ingangeni kwi-cache yokufunda, iBlockCache iya kusetyenziswa ngaphantsi rhoqo. I-BlockCache buffer inamacandelo amabini: i-LruBlockCache (ihlala ikwi-heap) kunye ne-BucketCache (ngokuqhelekileyo i-off-heap okanye kwi-SSD). I-BucketCache kufuneka isetyenziswe xa kukho izicelo ezininzi zokufunda kwaye azihambelani ne-LruBlockCache, ekhokelela kumsebenzi osebenzayo woMqokeleli weNkunkuma. Kwangaxeshanye, akufuneki ulindele ukwanda okukhulu ekusebenzeni ngokusetyenziswa kwe-cache yokufunda, kodwa siya kubuyela koku kumhlathi 8.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Kukho i-BlockCache enye ye-RS yonke, kwaye kukho i-MemStore enye kwitafile nganye (enye kwiKholamu yeNtsapho nganye).

njani ichazwe kwithiyori, xa ubhala, idatha ayingeni kwi-cache kwaye ngokwenene, iparameters ezinjalo CACHE_DATA_ON_WRITE kwitheyibhile kunye ne "Cache DATA on Write" ye-RS imiselwe ubuxoki. Nangona kunjalo, ekusebenzeni, ukuba sibhala idatha kwi-MemStore, emva koko siyigungxule kwidiski (ngaloo ndlela siyicime), emva koko ucime ifayile esiphumo, ngoko ngokwenza isicelo sokufumana siya kufumana ngempumelelo idatha. Ngaphezu koko, nokuba ukhubaza ngokupheleleyo iBlockCache kwaye ugcwalise itafile ngedatha entsha, emva koko usethe kwakhona iMemStore kwidisk, uyicime kwaye uyicele kwenye iseshoni, ziya kuphinda zithathwe kwenye indawo. Ke i-HBase ayigcini nje kuphela idatha, kodwa kunye neemfihlakalo ezingaqondakaliyo.

hbase(main):001:0> create 'ns:magic', 'cf'
Created table ns:magic
Took 1.1533 seconds
hbase(main):002:0> put 'ns:magic', 'key1', 'cf:c', 'try_to_delete_me'
Took 0.2610 seconds
hbase(main):003:0> flush 'ns:magic'
Took 0.6161 seconds
hdfs dfs -mv /data/hbase/data/ns/magic/* /tmp/trash
hbase(main):002:0> get 'ns:magic', 'key1'
 cf:c      timestamp=1534440690218, value=try_to_delete_me

I "Cache DATA on Read" iparamitha isetelwe kubuxoki. Ukuba unayo nayiphi na imibono, wamkelekile ukuba uxoxe ngayo kwizimvo.

5. I-Batch data processing MultiGet/MultiPut

Ukuqhubekekisa izicelo enye (Fumana/Beka/Cima) ngumsebenzi oxabisa kakhulu, ngoko ke ukuba kuyenzeka, kufuneka uzidibanise kuLuhlu okanye uLuhlu, olukuvumela ukuba ufumane ukonyuswa kokusebenza okubalulekileyo. Oku kuyinyani ngakumbi kumsebenzi wokubhala, kodwa xa kufundwa kukho lo mgibe ulandelayo. Igrafu engezantsi ibonisa ixesha lokufunda iirekhodi ze-50 ezivela kwi-MemStore. Ukufundwa kwenziwa kumsonto omnye kwaye i-axis ethe tyaba ibonisa inani lezitshixo kwisicelo. Apha unokubona ukuba xa unyuka ukuya kwiwaka lezitshixo kwisicelo esinye, ixesha lokuphumeza lihla, okt. isantya sanda. Nangona kunjalo, ngemodi ye-MSLAB enikwe amandla ngokungagqibekanga, emva kokuba lo mqobo ukwehla okukhulu ekusebenzeni kuqala, kwaye ubukhulu bedatha kwirekhodi, ixesha elide lokusebenza.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase

Uvavanyo lwenziwa kumatshini obonakalayo, ii-cores ezi-8, inguqulo ye-HBase 2.0.0-cdh6.0.0-beta1.

Imodi ye-MSLAB yenzelwe ukunciphisa ukuqhekeka kweemfumba, okwenzeka ngenxa yokuxuba idatha yesizukulwana esitsha kunye nesidala. Njengendlela yokusebenza, xa i-MSLAB yenziwe yasebenza, idatha ifakwa kwiiseli ezincinci (ii-chunks) kwaye zicutshungulwe ngokwee-chunks. Ngenxa yoko, xa ivolumu kwipakethi yedatha eceliwe idlula ubungakanani obubiweyo, ukusebenza kwehla ngokukhawuleza. Ngakolunye uhlangothi, ukucima le modi akukhuthazwa, kuba kuya kukhokelela ekumiseni ngenxa ye-GC ngexesha lokucubungula idatha. Isisombululo esilungileyo kukunyusa umthamo weseli kwimeko yokubhala okusebenzayo ngokubeka ngexesha elifanayo nokufunda. Kuyafaneleka ukuba uqaphele ukuba ingxaki ayenzeki ukuba, emva kokurekhoda, uqhuba umyalelo wokugungxula, omisela kwakhona iMemStore kwidisk, okanye ukuba ulayisha usebenzisa i-BulkLoad. Itheyibhile engezantsi ibonisa ukuba imibuzo evela kwi-MemStore yedatha enkulu (kunye nesixa esifanayo) ibangela ukucotha. Nangona kunjalo, ngokwandisa i-chunksize sibuyisela ixesha lokucubungula kwisiqhelo.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Ukongeza ekunyuseni i-chunksize, ukwahlula idatha ngommandla kunceda, i.e. ukwahlula itafile. Oku kubangela ukuba izicelo ezimbalwa eziza kummandla ngamnye kwaye ukuba zingena kwiseli, impendulo ihlala ilungile.

6. Iqhinga lokwahlulahlula iitafile ngokwemimandla (ukwahlula)

Kuba i-HBase isisitshixo-ixabiso lokugcina kwaye ukwahlula kuqhutywa sisitshixo, kubaluleke kakhulu ukwahlula-hlula idatha ngokulinganayo kuyo yonke imimandla. Umzekelo, ukwahlulahlula le tafile ibe ngamacandelo amathathu kuya kubangela ukuba idatha yahlulwe ibe yimimandla emithathu:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Kwenzeka ukuba oku kukhokelela kucothozo olubukhali ukuba idatha elayishwe kamva ijongeka ngathi, umzekelo, amanani amade, uninzi lwawo luqala ngenani elifanayo, umzekelo:

1000001
1000002
...
1100003

Kuba izitshixo zigcinwe njenge-byte uluhlu, zonke ziyakuqala ngokufanayo kwaye zibekwindawo enye #1 egcina olu luhlu lwezitshixo. Kukho iindlela ezininzi zokwahlulahlula:

I-HexStringSplit – Jikela isitshixo kwintambo enekhowudi ye-hexadecimal kuluhlu "00000000" => "FFFFFFFF" kunye nokupakisha ngakwesobunxele nge-zero.

I-UniformSplit - Jikela isitshixo kwi-byte array kunye ne-hexadecimal encoding kuluhlu "00" => "FF" kunye ne-padding ngasekunene kunye ne-zero.

Ukongeza, ungakhankanya naluphi na uluhlu okanye iqela lezitshixo zokwahlula kwaye uqwalasele ukwahlula-hlula okuzenzekelayo. Nangona kunjalo, enye yezona ndlela zilula nezisebenzayo yi-UniformSplit kunye nokusetyenziswa kwe-hash concatenation, umzekelo eyona peri ibalulekileyo yee-byte ukusuka ekusebenziseni isitshixo ngomsebenzi we-CRC32(rowkey) kunye ne-rowkey ngokwayo:

hash + rowkey

Emva koko yonke idatha iya kusasazwa ngokulinganayo kuyo yonke imimandla. Xa ufunda, ii-byte ezimbini zokuqala zilahlwa nje kwaye isitshixo sokuqala sihlala. I-RS iphinda ilawule ubungakanani bedatha kunye nezitshixo kummandla kwaye, ukuba imida igqithisiwe, iyaphule ngokuzenzekelayo ibe ngamacandelo.

7. Ukunyamezela impazamo kunye nendawo yedatha

Ekubeni ummandla omnye kuphela unoxanduva lwesethi nganye yezitshixo, isisombululo kwiingxaki ezinxulumene nokuphahlazeka kwe-RS okanye ukuchithwa kukugcina yonke idatha efunekayo kwi-HDFS. Xa i-RS iwa, inkosi ifumanisa oku ngokungabikho kwentliziyo kwi-ZooKeeper node. Emva koko yabela ummandla osetyenzisiweyo kwenye i-RS ​​kwaye kuba ii-HFiles zigcinwe kwinkqubo yefayile esasaziweyo, umnini omtsha uyazifunda kwaye aqhubeke nokusebenzela idatha. Nangona kunjalo, ekubeni enye idatha ingaba kwi-MemStore kwaye ayizange ibe nexesha lokungena kwi-HFiles, i-WAL, egcinwe kwi-HDFS, isetyenziselwa ukubuyisela imbali yokusebenza. Emva kokuba utshintsho lusetyenzisiwe, i-RS iyakwazi ukuphendula izicelo, kodwa ukuhamba kukhokelela ekubeni ezinye zedatha kunye neenkqubo ezizisebenzelayo ziphela kwiindawo ezahlukeneyo, oko kukuthi. indawo iyancipha.

Isisombululo sengxaki kukuxinwa okukhulu - le nkqubo ihambisa iifayile kwezo ndawo zijongene nazo (apho imimandla yazo ikhona), ngenxa yoko ngeli xesha le nkqubo umthwalo kwinethiwekhi kunye neediski zanda ngokukhawuleza. Nangona kunjalo, kwixesha elizayo, ukufikelela kwidatha kukhawuleziswa ngokubonakalayo. Ukongeza, major_compaction yenza ukudibanisa zonke HFiles zibe ifayile enye ngaphakathi kummandla, kwaye kwakhona icoca up data ngokuxhomekeke kuluhlu lwetafile. Umzekelo, ungachaza inani leenguqulelo zento ekufuneka igcinwe okanye ubomi bonke emva kokuba into leyo isuswe ngokwasemzimbeni.

Le nkqubo inokuba nefuthe elihle kakhulu ekusebenzeni kwe-HBase. Umfanekiso ongezantsi ubonisa indlela ukusebenza okuthotywe ngayo ngenxa yokurekhodwa kwedatha esebenzayo. Apha unokubona indlela imisonto engama-40 ebhale ngayo kwitafile enye kunye nemisonto engama-40 ngaxeshanye ifundwe idatha. Imisonto yokubhala ivelisa ngakumbi nangakumbi ii-HFiles, ezifundwa zezinye iintambo. Ngenxa yoko, idatha eninzi ngakumbi kufuneka isuswe kwimemori kwaye ekugqibeleni i-GC iqalise ukusebenza, nto leyo eyenza wonke umsebenzi ungasebenzi. Ukuphehlelelwa koxinzelelo olukhulu kukhokelele ekucocweni kobutyobo beziphumo nokubuyiselwa kwemveliso.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Uvavanyo lwenziwa kwi-3 DataNodes kunye ne-4 RS (CPU Xeon E5-2680 v4 @ 2.40GHz * 64 imicu). Inguqulo ye-HBase 1.2.0-cdh5.14.2

Kuyafaneleka ukuba uqaphele ukuba ukuxinwa okukhulu kwasungulwa kwitafile "ephilayo", apho idatha yayibhalwe kwaye yafundwa ngokusebenzayo. Kwakukho isitatimende kwi-intanethi ukuba oku kunokukhokelela kwimpendulo engalunganga xa ufunda idatha. Ukujonga, inkqubo yasungulwa eyenza idatha entsha kwaye yabhala kwitafile. Emva koko ndafunda ngokukhawuleza kwaye ndajonga ukuba ixabiso lesiphumo liyahambelana na noko kubhaliweyo. Ngelixa le nkqubo yayiqhuba, ukudityaniswa okukhulu kwaqhutywa malunga namaxesha angama-200 kwaye akukho nokusilela okurekhodiweyo. Mhlawumbi ingxaki ibonakala inqabile kwaye kuphela ngexesha lomthwalo ophezulu, ngoko ke kukhuselekile ukuyeka ukubhala kunye neenkqubo zokufunda njengoko kucwangcisiwe kwaye kwenziwe ukucoca ukuthintela ukutsalwa kwe-GC okunjalo.

Kwakhona, ukucutheka okukhulu akuchaphazeli ubume beMemStore, ukuyigungxula kwidisk kwaye uyihlanganise, kufuneka usebenzise igung (connection.getAdmin().flush(TableName.valueOf(tblName))).

8. Iisetingi kunye nokusebenza

Njengoko sele kukhankanyiwe, HBase ibonisa impumelelo yayo enkulu apho akuyomfuneko ukwenza nantoni na, xa uphumeza BulkLoad. Nangona kunjalo, oku kusebenza kwiinkqubo ezininzi kunye nabantu. Nangona kunjalo, esi sixhobo sifanelekile ngakumbi ukugcina idatha ngobuninzi kwiibhloko ezinkulu, kanti ukuba inkqubo idinga izicelo ezininzi ezikhuphisanayo zokufunda nokubhala, i-Get and Put imiyalelo echazwe ngasentla isetyenziswa. Ukumisela iiparamitha ezizezona zilungileyo, ukuqaliswa kwaqhutywa kunye neendibaniselwano ezahlukeneyo zeeparamitha zetafile kunye noseto:

  • Imisonto eli-10 yaphehlelelwa ngaxeshanye izihlandlo ezi-3 ngokulandelelana (masiyibize le bhloko yemisonto).
  • Ixesha lokusebenza kwayo yonke imicu kwibhloko yayiyi-avareji kwaye yaba sisiphumo sokugqibela sokusebenza kwebhloko.
  • Yonke imisonto yasebenza ngetafile enye.
  • Ngaphambi kokuqala kwebhloko yentambo, ukudibanisa okukhulu kwenziwa.
  • Ibhloko nganye yenze umsebenzi omnye kuphela kule ilandelayo:

β€”Beka
β€”Fumana
β€”Fumana+Beka

  • Ibhloko nganye yenza i-50 iiterations yokusebenza kwayo.
  • Ubungakanani bebhloko yerekhodi yi-100 bytes, i-1000 bytes okanye i-10000 bytes (i-random).
  • Iibhloko zaqaliswa ngamanani ahlukeneyo ezitshixo eziceliweyo (inokuba sinye isitshixo okanye i-10).
  • Iibhloko zaqhutywa phantsi kwezicwangciso zetafile ezahlukeneyo. Iiparamitha zitshintshiwe:

β€” I-BlockCache = ivuliwe okanye icinyiwe
β€” BlockSize = 65 KB okanye 16 KB
- Izahlulo = 1, 5 okanye 30
β€” MSLAB = yenziwe okanye ivaliwe

Ngoko ibhloko ibonakala ngolu hlobo:

a. Imo ye-MSLAB yavulwa/yavalwa.
b. Itheyibhile yenzelwe ukuba iiparamitha ezilandelayo zimiselwe: I-BlockCache = inyani / ayikho, iBlockSize = 65/16 Kb, iSahlulo = 1/5/30.
c. Ucinezelo lwalusetiwe kuGZ.
d. Imisonto eli-10 yaphehlelelwa ngaxeshanye isenza 1/10 beka/fumana/fumana+imisebenzi kule theyibhile kunye neerekhodi ze-100/1000/10000 bytes, zenza imibuzo engama-50 ngokulandelelana (izitshixo ezingaqhelekanga).
e. Inqaku d liphindwe kathathu.
f. Ixesha lokusebenza kwayo yonke imisonto lilinganiselwe.

Zonke iindibaniselwano ezinokubakho zavavanywa. Kuqikelelwa ukuba isantya siya kuhla njengoko ubungakanani berekhodi bukhula, okanye ukukhubaza i-caching kuya kubangela ukucotha. Nangona kunjalo, injongo yayikukuqonda iqondo kunye nokubaluleka kwempembelelo yeparameter nganye, ngoko ke idatha eqokelelweyo yondliwa kwigalelo lomsebenzi wokubuyisela umgca, okwenza kube lula ukuvavanya ukubaluleka usebenzisa i-t-statistics. Ngezantsi ziziphumo zeebhloko ezenza imisebenzi ye-Put. Iseti epheleleyo yokudibanisa 2 * 2 * 3 * 2 * 3 = 144 iinketho + 72 tk. ezinye zenziwa kabini. Ke ngoko, kukho imitsi engama-216 iyonke:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Uvavanyo lwenziwa kwi-mini-cluster equkethe i-3 DataNodes kunye ne-4 RS (CPU Xeon E5-2680 v4 @ 2.40GHz * 64 threads). Inguqulo ye-HBase 1.2.0-cdh5.14.2.

Isantya esiphezulu sokufaka imizuzwana ye-3.7 sifunyenwe kunye nemodi ye-MSLAB ivaliwe, kwitafile enesahlulo esinye, kunye neBlockCache inikwe amandla, iBlockSize = i-16, iirekhodi ze-100 bytes, iinqununu ze-10 ngepakethi.
Isantya esisezantsi sokufakwa kwe-82.8 sec sifunyenwe ngemodi ye-MSLAB enikwe amandla, kwitafile enesahlulo esinye, kunye neBlockCache inikwe amandla, iBlockSize = 16, iirekhodi ze-10000 bytes, i-1 nganye.

Ngoku makhe sijonge imodeli. Sibona umgangatho olungileyo wemodeli esekwe kwi-R2, kodwa kucace gca ukuba i-extrapolation ichasene apha. Eyona ndlela yokuziphatha yenkqubo xa iiparamitha zitshintsha aziyi kuba ngumgca; le modeli ayifuneki kuqikelelo, kodwa ukuqonda okwenzekileyo phakathi kweeparamitha ezinikiweyo. Umzekelo, apha sibona kwikhrayitheriya yoMfundi ukuba iiparamitha zeBlockSize kunye neBlockCache azinamsebenzi nomsebenzi wePut (oluqikeleleka ngokubanzi):

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Kodwa into yokuba ukwandisa inani lezahlulo kukhokelela ekunciphiseni kwentsebenzo yinto engalindelekanga (sele sibonile impembelelo efanelekileyo yokwandisa inani lezahlulo nge-BulkLoad), nangona kuyaqondakala. Okokuqala, ukucubungula, kufuneka uvelise izicelo kwimimandla ye-30 endaweni yesinye, kwaye umthamo wedatha awunjalo ukuba oku kuya kuvelisa inzuzo. Okwesibini, ixesha elipheleleyo lokusebenza lichongwa yi-RS ehamba kancinci, kwaye ekubeni inani leDathaNodes lingaphantsi kwenani le-RSs, ezinye iingingqi zinendawo ye-zero. Ewe, makhe sijonge ezintlanu eziphezulu:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Ngoku makhe sivavanye iziphumo zokwenza iibhloko zokufumana:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Inani lezahlulo lilahlekelwe ukubaluleka, nto leyo echazwe yinyaniso yokuba idatha igcinwe kakuhle kwaye i-cache efundwayo yeyona nto ibalulekileyo (ngokwezibalo) ipharamitha. Ngokwemvelo, ukwandisa inani lemiyalezo kwisicelo nako kuluncedo kakhulu ekusebenzeni. Amanqaku aphezulu:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Ewe, ekugqibeleni, makhe sijonge imodeli yebhloko eyenziwe kuqala fumana kwaye sibeke:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Zonke iiparameters zibalulekile apha. Kwaye iziphumo zeenkokeli:

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase

9. Uvavanyo lomthwalo

Ewe, ekugqibeleni siza kusungula umthwalo obekekileyo ngaphezulu okanye ngaphantsi, kodwa ihlala inomdla ngakumbi xa unento onokuyithelekisa nayo. Kwiwebhusayithi yeDataStax, umthuthukisi ophambili weCassandra, kukho iziphumo I-NT yenani le-NoSQL yokugcina, kuquka i-HBase version 0.98.6-1. Ukulayisha kuqhutywe yimicu ye-40, ubukhulu bedatha ye-100 bytes, iidiski ze-SSD. Isiphumo sokuvavanya imisebenzi ye-Read-Modify-Write ibonise ezi ziphumo zilandelayo.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Njengoko ndiqonda, ukufunda kuqhutywe kwiibhloko zeerekhodi ze-100 kunye ne-16 HBase nodes, uvavanyo lweDataStax lubonise ukusebenza kwe-10 yewaka lemisebenzi ngomzuzwana.

Kulithamsanqa ukuba iqela lethu libuye libe ne-16 nodes, kodwa "inhlanhla" kakhulu ukuba nganye inama-cores angama-64 (imisonto), ngelixa kuvavanyo lweDataStax kukho kuphela i-4. Ngakolunye uhlangothi, bane-SSD drives, ngelixa sine-HDD. okanye ngaphezulu inguqulelo entsha ye-HBase kunye nokusetyenziswa kwe-CPU ngexesha lomthwalo ngokwenyani ayizange inyuke kakhulu (ngokubonakalayo nge-5-10 ekhulwini). Nangona kunjalo, makhe sizame ukuqalisa ukusebenzisa olu lungelelwaniso. Izicwangciso zetafile ezihlala zikhona, ukufundwa kwenziwa kuluhlu oluphambili ukusuka kwi-0 ukuya kwi-50 yezigidi ngokungacwangciswanga (oko kukuthi, ngokuyisiseko entsha rhoqo). Itheyibhile iqulethe iirekhodi ezizigidi ezingama-50, zahlulwe zibe ngamacandelo angama-64. Izitshixo ziheshwa kusetyenziswa i-crc32. Izicwangciso zetafile azigqibekanga, iMSLAB yenziwe yasebenza. Ukusungula imisonto ye-40, intambo nganye ifunda isethi ye-100 yezitshixo ezingahleliwe kwaye ngokukhawuleza ubhala i-bytes eveliswayo eyi-100 emva kwezi zitshixo.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Yima: 16 DataNode kunye ne-16 RS (CPU Xeon E5-2680 v4 @ 2.40GHz * 64 imicu). Inguqulo ye-HBase 1.2.0-cdh5.14.2.

Umphumo oqhelekileyo usondele kwi-40 yewaka lemisebenzi ngomzuzwana, ongcono kakhulu kunovavanyo lweDataStax. Nangona kunjalo, ngeenjongo zokulinga, unokutshintsha kancinci iimeko. Akunakwenzeka ukuba wonke umsebenzi uqhutywe ngokukodwa kwitafile enye, kwaye kuphela kwizitshixo ezizodwa. Makhe sicinge ukuba kukho isethi ethile yezitshixo "eshushu" eyenza owona mthwalo ungundoqo. Ngoko ke, makhe sizame ukwenza umthwalo kunye neerekhodi ezinkulu (10 KB), nakwiibhetshi ze-100, kwiitheyibhile ezi-4 ezahlukeneyo kunye nokunciphisa uluhlu lwezitshixo eziceliwe ukuya kwi-50 lamawaka. isethi yezitshixo ezili-40 kwaye ngoko nangoko ubhala ngokungacwangciswanga 100 KB kwezi zitshixo emva.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Yima: 16 DataNode kunye ne-16 RS (CPU Xeon E5-2680 v4 @ 2.40GHz * 64 imicu). Inguqulo ye-HBase 1.2.0-cdh5.14.2.

Ngethuba lomthwalo, ukunyanzeliswa okukhulu kwaqaliswa ngamaxesha amaninzi, njengoko kuboniswe ngasentla, ngaphandle kwale nkqubo, ukusebenza kuya kuncipha ngokuthe ngcembe, nangona kunjalo, umthwalo owongezelelweyo uvela ngexesha lokuphunyezwa. Ukutsalwa phantsi kubangelwa zizizathu ezahlukeneyo. Ngamanye amaxesha imisonto igqibile ukusebenza kwaye bekukho ikhefu ngelixa iphinda iqalwa, ngamanye amaxesha izicelo zomntu wesithathu zenza umthwalo kwiqela.

Ukufunda nokubhala kwangoko yenye yezona meko zinzima kakhulu zeHBase. Ukuba wenza izicelo ezincinci zokubeka, umzekelo i-bytes eyi-100, ukudibanisa kwiipakethi ze-10-50 amawaka amaqhekeza, unokufumana amakhulu amawaka emisebenzi ngomzuzwana, kwaye imeko iyafana nezicelo zokufunda kuphela. Kuyafaneleka ukuba uqaphele ukuba iziphumo zingcono kakhulu kunezo zifunyenwe yiDathaStax, ininzi yazo zonke ngenxa yezicelo kwiibhloko ze-50 lamawaka.

Ithiyori kunye nokuziqhelanisa nokusebenzisa i-HBase
Yima: 16 DataNode kunye ne-16 RS (CPU Xeon E5-2680 v4 @ 2.40GHz * 64 imicu). Inguqulo ye-HBase 1.2.0-cdh5.14.2.

10. Izigqibo

Le nkqubo iqwalaselwe ngokuguquguqukayo, kodwa impembelelo yenani elikhulu leeparamitha ayikaziwa. Ezinye zazo zavavanywa, kodwa azizange zibandakanywe kwiseti yovavanyo olunesiphumo. Umzekelo, imifuniselo yangaphambili ibonise ukubaluleka okungabalulekanga kweparameter njenge DATA_BLOCK_ENCODING, efaka iinkcukacha kusetyenziswa amaxabiso asuka kwiiseli ezingabamelwane, eqondakalayo kwidatha eyenziwe ngokungenamkhethe. Ukuba usebenzisa inani elikhulu lezinto eziphindwe kabini, inzuzo inokubaluleka. Ngokubanzi, sinokuthi i-HBase inika uluvo lwesiseko sedatha esinobuzaza kwaye sicingisiswe kakuhle, esinokuba nemveliso kakhulu xa kusenziwa imisebenzi ngeebhloko ezinkulu zedatha. Ngokukodwa ukuba kunokwenzeka ukwahlula iinkqubo zokufunda nokubhala ngexesha.

Ukuba kukho into kwimbono yakho engachazwanga ngokwaneleyo, ndikulungele ukukuxelela ngokubanzi. Siyakumema ukuba wabelane ngamava akho okanye uxoxe ukuba awuvumelani nento ethile.

umthombo: www.habr.com

Yongeza izimvo