Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Ukusebenza okuphezulu yenye yeemfuno eziphambili xa usebenza ngedatha enkulu. Kwisebe lokulayisha idatha e-Sberbank, simpompa phantse zonke iintengiselwano kwi-Hadoop-based Data Cloud kwaye ke ngoko sijongana nokuhamba kolwazi olukhulu ngokwenene. Ngokwemvelo, sihlala sijonge iindlela zokuphucula ukusebenza, kwaye ngoku sifuna ukukuxelela indlela esikwazile ngayo ukupakisha i-RegionServer HBase kunye nomxhasi we-HDFS, enkosi apho sakwazi ukunyusa kakhulu isantya sokufunda ukusebenza.
Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Nangona kunjalo, ngaphambi kokuba uqhubele phambili kwingundoqo yokuphucula, kuyafaneleka ukuthetha malunga nezithintelo, ngokomgaqo, ezingenako ukuphetshwa ukuba uhlala kwi-HDD.

Kutheni i-HDD kunye noFikelelo olukhawulezayo lweRandom ukufunda aluhambelani
Njengoko usazi, i-HBase, kunye nezinye iidatabase ezininzi, gcina idatha kwiibhloko zamashumi amaninzi eekhilobhayithi ngobukhulu. Ngokungagqibekanga imalunga nama-64 KB. Ngoku khawucinge ukuba kufuneka sifumane kuphela i-100 bytes kwaye sicela i-HBase ukuba isinike le datha isebenzisa isitshixo esithile. Ekubeni ubungakanani bebhloko kwi-HFiles yi-64 KB, isicelo siya kuba sikhulu ngokuphindwe ka-640 (umzuzu nje!) kunokuba kuyimfuneko.

Okulandelayo, ekubeni isicelo siya kuhamba nge-HDFS kunye ne-metadata caching mechanism ShortCircuitCache (evumela ukufikelela ngokuthe ngqo kwiifayile), oku kukhokelela ekufundeni sele i-1 MB kwidiski. Nangona kunjalo, oku kunokulungiswa kunye neparameter dfs.client.read.shortcircuit.buffer.size kwaye kwiimeko ezininzi kunengqiqo ukunciphisa eli xabiso, umzekelo ukuya kwi-126 KB.

Masithi senza oku, kodwa ukongeza, xa siqala ukufunda idatha nge-java api, njengemisebenzi efana neFayileChannel.read kwaye ucele inkqubo yokusebenza ukuba ifunde inani elichaziweyo ledatha, ifundeka "nje ukuba kwenzeka" amaxesha angama-2 ngaphezulu. , o.k. 256 KB kwimeko yethu. Oku kungenxa yokuba i-java ayinayo indlela elula yokuseta i-FADV_RANDOM iflegi ukunqanda lo kuziphatha.

Ngenxa yoko, ukufumana i-100 bytes yethu, amaxesha angama-2600 ngaphezulu afundwa phantsi kwehood. Kubonakala ngathi isisombululo sicacile, masinciphise ubungakanani bebhloko kwikhilobhayithi, sisethe iflegi ekhankanyiweyo kwaye sifumane ukukhawuleziswa kokukhanya okukhulu. Kodwa inkathazo kukuba ngokunciphisa ubungakanani bebhloko ngamaxesha ama-2, sikwanciphisa inani leebhayithi ezifundwa kwiyunithi yexesha ngamaxesha angama-2.

Inzuzo ethile ngokuseta iflegi ye-FADV_RADOM inokufumaneka, kodwa kuphela nge-multi-threading ephezulu kunye nobukhulu bebhloko ye-128 KB, kodwa obu bubuninzi bamashumi amabini ekhulwini:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Uvavanyo lwenziwa kwiifayile ze-100, nganye i-1 GB ngobukhulu kwaye ibekwe kwii-HDD ze-10.

Masibale into esinokuthi, ngokomgaqo, sithembele ngayo kwesi santya:
Masithi sifunda kwiidiski ezili-10 ngesantya se-280 MB / sec, i.e. 3 million amaxesha 100 bytes. Kodwa njengoko sikhumbula, idatha esiyidingayo ngamaxesha angama-2600 ngaphantsi koko kufundiweyo. Ngaloo ndlela, sahlula i-3 yezigidi nge-2600 kwaye sifumana Iirekhodi ze-1100 ngesekhondi.

Kuyadanisa, akunjalo? Yindalo leyo Ukufikelela ngokungaqhelekanga ukufikelela kwidatha kwi-HDD - kungakhathaliseki ukuba ubungakanani bebhloko. Lo ngumda obonakalayo wofikelelo olungenamkhethe kwaye akukho dathabheki inokucudisa ngaphezulu phantsi kweemeko ezinjalo.

Ngaba ke oovimba beenkcukacha bafikelela njani kwizantya eziphezulu? Ukuphendula lo mbuzo, makhe sijonge okwenzekayo kulo mfanekiso ulandelayo:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Apha sibona ukuba kwimizuzu embalwa yokuqala isantya ngokwenene malunga newaka leerekhodi ngomzuzwana. Nangona kunjalo, ngakumbi, ngenxa yokuba kuninzi okufundiweyo kunokuba bekuceliwe, idatha iphelela kwi-buff / cache yenkqubo yokusebenza (linux) kwaye isantya sinyuka ukuya kwi-60 yamawaka ehloniphekileyo ngomzuzwana.

Ngaloo ndlela, ngokubhekele phaya siya kujongana nokukhawulezisa ukufikelela kuphela kwidatha ekwi-cache ye-OS okanye ebekwe kwi-SSD/NVMe izixhobo zokugcina isantya esifaniswayo sokufikelela.

Kwimeko yethu, siya kuqhuba iimvavanyo kwibhentshi yeeseva ezi-4, nganye yazo ihlawuliswa ngolu hlobo lulandelayo:

CPU: Xeon E5-2680 v4 @ 2.40GHz 64 imisonto.
Imemori: 730 GB.
inguqulelo yejava: 1.8.0_111

Kwaye apha inqaku eliphambili lixabiso ledatha kwiitheyibhile ezifuna ukufundwa. Inyaniso kukuba ukuba ufunda idatha kwitafile ebekwe ngokupheleleyo kwi-cache ye-HBase, ngoko ayiyi kufunda ukusuka kwi-buff / cache yenkqubo yokusebenza. Ngenxa yokuba i-HBase ngokungagqibekanga yabela i-40% yememori kwisakhiwo esibizwa ngokuba yiBlockCache. Ngokusisiseko le yi-ConcurrentHashMap, apho isitshixo ligama lefayile + i-offset yebhloko, kwaye ixabiso lelona datha kule offset.

Ngaloo ndlela, xa sifunda kuphela kwesi sakhiwo, thina sibona isantya esibalaseleyo, njengesigidi sezicelo ngomzuzwana. Kodwa makhe sicinge ukuba asinako ukwaba amakhulu eegigabhayithi zememori kwiimfuno zesiseko sedata, kuba zininzi ezinye izinto eziluncedo ezisebenza kwezi seva.

Ngokomzekelo, kwimeko yethu, umthamo weBlockCache kwi-RS enye malunga ne-12 GB. Samisa iiRS ezimbini kwindawo enye, okt. I-96 GB yabelwe i-BlockCache kuzo zonke iindawo. Kwaye kukho amaxesha amaninzi amaninzi edatha, umzekelo, makube ziitafile ezi-4, imimandla ye-130 nganye, apho iifayile ziyi-800 MB ngobukhulu, zixinzelelwe ngu-FAST_DIFF, okt. iyonke i-410 GB (le datha ecocekileyo, oko kukuthi ngaphandle kokuqwalasela i-replication factor).

Ngaloo ndlela, i-BlockCache kuphela malunga ne-23% yomthamo wedatha iyonke kwaye oku kusondele kakhulu kwiimeko zangempela zento ebizwa ngokuba yi-BigData. Kwaye kulapho ulonwabo luqala khona - kuba ngokucacileyo, i-cache encinci ibetha, kokukhona ukusebenza kubi. Emva kwayo yonke loo nto, ukuba uyaphoswa, kuya kufuneka wenze umsebenzi omninzi - i.e. yehla ukuya kwimisebenzi yenkqubo yokufowuna. Nangona kunjalo, oku akunakuphetshwa, ngoko ke makhe sijonge inkalo eyahlukileyo ngokupheleleyo - kwenzeka ntoni kwidatha engaphakathi kwi-cache?

Masenze lula imeko kwaye sicinge ukuba sine-cache elingana nento enye. Nanku umzekelo wento eya kwenzeka xa sizama ukusebenza ngevolumu yedatha amaxesha ama-1 amakhulu kune-cache, kuya kufuneka:

1. Beka ibhloko ye-1 kwi-cache
2. Susa ibhloko 1 kwi-cache
3. Beka ibhloko ye-2 kwi-cache
4. Susa ibhloko 2 kwi-cache
5. Beka ibhloko ye-3 kwi-cache

Izenzo ezi-5 zigqityiwe! Nangona kunjalo, le meko ayinakubizwa ngokuba yesiqhelo; enyanisweni, sinyanzela i-HBase ukuba yenze iqela lomsebenzi ongenamsebenzi ngokupheleleyo. Ihlala ifunda idatha kwi-cache ye-OS, ibeka kwi-BlockCache, kuphela ukuyilahla ngokukhawuleza ngenxa yokuba inxalenye entsha yedatha ifikile. Upopayi ekuqaleni kwesithuba ubonisa undoqo wengxaki - uMqokeleli weNkunkuma uya esikalini, umoya uyafudumala, uGreta omncinci kwiSweden ekude kwaye eshushu uyacaphuka. Kwaye thina bantu be-IT asiyithandi ngokwenene xa abantwana bebuhlungu, ngoko siqala ukucinga malunga nento esinokuyenza ngayo.

Kuthekani ukuba awufaki zonke iibhloko kwi-cache, kodwa kuphela ipesenti ethile yazo, ukuze i-cache ingaphumi? Masiqale ngokongeza nje imigca embalwa yekhowudi ekuqaleni komsebenzi wokufaka idatha kwiBlockCache:

  public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) {
    if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) {
      if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) {
        return;
      }
    }
...

Inqaku apha lilandelayo: i-offset yindawo yebhloko kwifayile kwaye amadijithi ayo okugqibela ahamba ngokulandelelana kwaye asasazwe ngokulinganayo ukusuka kwi-00 ukuya kwi-99.

Umzekelo, seta i-cacheDataBlockPercent = 20 kwaye ubone ukuba kwenzeka ntoni:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Isiphumo sicacile. Kwiigrafu ezingezantsi, kuyacaca ukuba kutheni ukukhawuleza kwenzeke - sigcina izixhobo ezininzi ze-GC ngaphandle kokwenza umsebenzi weSisyphean wokubeka idatha kwi-cache kuphela ngokukhawuleza ukuyiphosa phantsi komjelo wezinja zaseMartian:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Kwangaxeshanye, ukusetyenziswa kwe-CPU kuyanda, kodwa kungaphantsi kakhulu kunemveliso:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Kwakhona kuyafaneleka ukuba uqaphele ukuba iibhloko ezigcinwe kwi-BlockCache zihluke. Uninzi, malunga ne-95%, yidatha ngokwayo. Kwaye enye yimetadata, efana nezihluzi zeBloom okanye LEAF_INDEX kunye njl.. Le datha ayanele, kodwa iluncedo kakhulu, kuba ngaphambi kokufikelela kwidatha ngokuthe ngqo, i-HBase iphendukela kwi-meta ukuqonda ukuba ngaba kuyimfuneko ukukhangela apha ngakumbi kwaye, ukuba kunjalo, apho ibhloko yomdla ikhona.

Ngoko ke, kwikhowudi sibona imeko yokukhangela buf.getBlockType().isData() kwaye siyabulela kule meta, siya kuyishiya kwi-cache kuyo nayiphi na imeko.

Ngoku masinyuse umthwalo kwaye siqinise kancinci isiciko ngexesha elinye. Kuvavanyo lokuqala senze ipesenti ye-cutoff = 20 kunye neBlockCache yayingasetyenziswanga kancinci. Ngoku masiyibeke kwi-23% kwaye songeze i-100 lemisonto rhoqo ngemizuzu emi-5 ukubona ukuba yeyiphi indawo:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Apha sibona ukuba inguqulelo yoqobo iphantse yabetha ngokukhawuleza kwisilingi malunga nezicelo ezingamawaka ayi-100 ngomzuzwana. Ngelixa i-patch inika ukukhawuleza ukuya kuma-300 amawaka. Kwangaxeshanye, kuyacaca ukuba ukukhawulezisa okongezelelekileyo akusekho β€œsimahla”; Ukusetyenziswa kwe-CPU kuyanda.

Nangona kunjalo, esi ayisosisombululo esihle kakhulu, kuba asazi kwangaphambili ukuba yeyiphi ipesenti yeebhloko ezifuna ukugcinwa, kuxhomekeke kwiprofayili yomthwalo. Ngoko ke, umatshini waphunyezwa ukulungelelanisa ngokuzenzekelayo le parameter ngokuxhomekeke kumsebenzi wokufunda.

Iinketho ezintathu zongeziwe ukulawula oku:

hbase.lru.cache.heavy.eviction.count.limit - icwangcisa ukuba mangaphi amaxesha inkqubo yokukhupha idatha kwi-cache kufuneka iqhube phambi kokuba siqalise ukusebenzisa ukulungelelanisa (okt ukutsiba iibhloko). Ngokungagqibekanga ilingana no MAX_INT = 2147483647 kwaye eneneni ithetha ukuba uphawu alunakuqala ukusebenza ngeli xabiso. Ngenxa yokuba inkqubo yokukhutshwa iqala rhoqo kwi-5 - 10 imizuzwana (kuxhomekeke kumthwalo) kunye 2147483647 * 10 / 60 / 60 / 24 / 365 = 680 iminyaka. Nangona kunjalo, sinokuseta le parameter ku-0 kwaye senze inqaku lisebenze kwangoko emva kokusungulwa.

Nangona kunjalo, kukho kwakhona umvuzo kule parameter. Ukuba umthwalo wethu ufana nokufundwa kwexesha elifutshane (ukuthi emini) kunye nokufundwa kwexesha elide (ebusuku) kugxininiswe rhoqo, ngoko sinokuqinisekisa ukuba isici sivulwa kuphela xa imisebenzi yokufunda ixesha elide iqhubeka.

Umzekelo, siyazi ukuba ufundo lwexesha elifutshane luhlala luhlala malunga nomzuzu omnye. Akukho mfuneko yokuba uqale ukulahla iibhloko, i-cache ayiyi kuba nexesha lokuba iphelelwe lixesha kwaye emva koko sinokuseta le parameter ilingane, umzekelo, 1. Oku kuya kukhokelela kwinto yokuba ukulungiswa kuya kuqalisa ukusebenza kuphela xa ixesha elide- ixesha lokufunda ngokusebenzayo sele liqalile, o.k.t. kwimizuzwana eyi-10. Ngaloo ndlela, ukuba sinokufunda ixesha elifutshane, ke zonke iibhloko ziya kungena kwi-cache kwaye ziya kufumaneka (ngaphandle kwezo ziya kukhutshwa yi-algorithm eqhelekileyo). Kwaye xa sifunda ixesha elide, inqaku liyavulwa kwaye siya kuba nokusebenza okuphezulu kakhulu.

hbase.lru.cache.heavy.eviction.mb.size.limit - ibeka ukuba zingaphi iimegabytes esingathanda ukuzifaka kwi-cache (kwaye, ngokuqinisekileyo, ukukhupha) kwimizuzwana eyi-10. Uphawu luya kuzama ukufikelela kweli xabiso kwaye ligcinwe. Ingongoma yile: ukuba sityhala i-gigabytes kwi-cache, ngoko kuya kufuneka sikhuphe i-gigabytes, kwaye oku, njengoko sibonile ngasentla, kubiza kakhulu. Nangona kunjalo, akufanele uzame ukuseta encinci kakhulu, njengoko oku kuya kubangela ukuba imowudi yokutsiba ibhloko iphume phambi kwexesha. Kwiiseva ezinamandla (malunga ne-20-40 ye-core cores), kulungele ukuseta malunga ne-300-400 MB. Kudidi oluphakathi (~ 10 cores) 200-300 MB. Kwiinkqubo ezibuthathaka (2-5 cores) i-50-100 MB inokuba yinto eqhelekileyo (ayivavanywanga kwezi).

Makhe sijonge indlela esebenza ngayo oku: masithi siseta hbase.lru.cache.heavy.eviction.mb.size.limit = 500, kukho uhlobo oluthile lomthwalo (ukufunda) kwaye emva koko yonke ~ 10 imizuzwana sibala ukuba zingaphi ii-bytes ukhutshiwe kusetyenziswa ifomula :

Umphezulu = ISambuku seeBytes eziKhululiwe (MB) * 100 / Umda (MB) - 100;

Ukuba eneneni u-2000 MB ukhutshiwe, i-Overhead ilingana no:

2000 * 100 / 500 - 100 = 300%

I-algorithms izama ukugcina akukho ngaphezu kwamashumi ambalwa eepesenti, ngoko uphawu luya kunciphisa ipesenti yeebhloko ezigcinwe kwi-cached, ngaloo ndlela kuphunyezwa umatshini wokuzilungisa ngokuzenzekelayo.

Nangona kunjalo, ukuba umthwalo uyehla, masithi kuphela yi-200 MB ekhutshiweyo kwaye i-Overhead iba yimbi (ebizwa ngokuba kukudlula):

200 * 100 / 500 - 100 = -60%

Ngokuchaseneyo, inqaku liza kwandisa ipesenti yeebhloko ezigciniweyo de i-Overhead ibe yinto entle.

Ngezantsi ngumzekelo wendlela oku kukhangeleka ngayo kwidatha yangempela. Akukho mfuneko yokuzama ukufikelela kwi-0%, akunakwenzeka. Kuhle kakhulu xa kumalunga ne-30 - 100%, oku kunceda ukuphepha ukuphuma kwangaphambi kwexesha kwimodi yokuphucula ngexesha lokunyuka kwexesha elifutshane.

hbase.lru.cache.heavy.eviction.overhead.coefficient - ibeka ukuba singathanda ngokukhawuleza kangakanani ukufumana iziphumo. Ukuba siyazi ngokuqinisekileyo ukuba ukufunda kwethu kude kwaye asifuni kulinda, sinokuwonyusa lo mlinganiselo kwaye sifumane ukusebenza okuphezulu ngokukhawuleza.

Umzekelo, sibeka lo mlinganiso = 0.01. Oku kuthetha ukuba i-Overhead (jonga ngasentla) iya kuphindwa-phinda ngeli nani ngesiphumo esisiphumo kwaye ipesenti yeebhloko ezigciniweyo ziya kuncitshiswa. Makhe sicinge ukuba i-Overhead = 300% kunye ne-coefficient = 0.01, ngoko ipesenti yeebhloko ezigciniweyo ziya kuncitshiswa nge-3%.

Ingcinga efanayo "Yoxinzelelo lwasemva" ikwaphunyezwa kumaxabiso aPhambili angalunganga (ukudubula ngaphezulu). Kuba ukuguquguquka kwexesha elifutshane kumthamo wokufunda kunye nokukhutshwa kuhlala kusenzeka, le ndlela ikuvumela ukuba uphephe ukuphuma kwangethuba kwindlela yokuphucula. Uxinzelelo lwangasemva lunengqikelelo egqwethiweyo: okukhona kunamandla okugqithisela, kokukhona iibhloko ezininzi zigcinwe kwi-cache.

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Ikhowudi yokuphumeza

        LruBlockCache cache = this.cache.get();
        if (cache == null) {
          break;
        }
        freedSumMb += cache.evict()/1024/1024;
        /*
        * Sometimes we are reading more data than can fit into BlockCache
        * and it is the cause a high rate of evictions.
        * This in turn leads to heavy Garbage Collector works.
        * So a lot of blocks put into BlockCache but never read,
        * but spending a lot of CPU resources.
        * Here we will analyze how many bytes were freed and decide
        * decide whether the time has come to reduce amount of caching blocks.
        * It help avoid put too many blocks into BlockCache
        * when evict() works very active and save CPU for other jobs.
        * More delails: https://issues.apache.org/jira/browse/HBASE-23887
        */

        // First of all we have to control how much time
        // has passed since previuos evict() was launched
        // This is should be almost the same time (+/- 10s)
        // because we get comparable volumes of freed bytes each time.
        // 10s because this is default period to run evict() (see above this.wait)
        long stopTime = System.currentTimeMillis();
        if ((stopTime - startTime) > 1000 * 10 - 1) {
          // Here we have to calc what situation we have got.
          // We have the limit "hbase.lru.cache.heavy.eviction.bytes.size.limit"
          // and can calculte overhead on it.
          // We will use this information to decide,
          // how to change percent of caching blocks.
          freedDataOverheadPercent =
            (int) (freedSumMb * 100 / cache.heavyEvictionMbSizeLimit) - 100;
          if (freedSumMb > cache.heavyEvictionMbSizeLimit) {
            // Now we are in the situation when we are above the limit
            // But maybe we are going to ignore it because it will end quite soon
            heavyEvictionCount++;
            if (heavyEvictionCount > cache.heavyEvictionCountLimit) {
              // It is going for a long time and we have to reduce of caching
              // blocks now. So we calculate here how many blocks we want to skip.
              // It depends on:
             // 1. Overhead - if overhead is big we could more aggressive
              // reducing amount of caching blocks.
              // 2. How fast we want to get the result. If we know that our
              // heavy reading for a long time, we don't want to wait and can
              // increase the coefficient and get good performance quite soon.
              // But if we don't sure we can do it slowly and it could prevent
              // premature exit from this mode. So, when the coefficient is
              // higher we can get better performance when heavy reading is stable.
              // But when reading is changing we can adjust to it and set
              // the coefficient to lower value.
              int change =
                (int) (freedDataOverheadPercent * cache.heavyEvictionOverheadCoefficient);
              // But practice shows that 15% of reducing is quite enough.
              // We are not greedy (it could lead to premature exit).
              change = Math.min(15, change);
              change = Math.max(0, change); // I think it will never happen but check for sure
              // So this is the key point, here we are reducing % of caching blocks
              cache.cacheDataBlockPercent -= change;
              // If we go down too deep we have to stop here, 1% any way should be.
              cache.cacheDataBlockPercent = Math.max(1, cache.cacheDataBlockPercent);
            }
          } else {
            // Well, we have got overshooting.
            // Mayby it is just short-term fluctuation and we can stay in this mode.
            // It help avoid permature exit during short-term fluctuation.
            // If overshooting less than 90%, we will try to increase the percent of
            // caching blocks and hope it is enough.
            if (freedSumMb >= cache.heavyEvictionMbSizeLimit * 0.1) {
              // Simple logic: more overshooting - more caching blocks (backpressure)
              int change = (int) (-freedDataOverheadPercent * 0.1 + 1);
              cache.cacheDataBlockPercent += change;
              // But it can't be more then 100%, so check it.
              cache.cacheDataBlockPercent = Math.min(100, cache.cacheDataBlockPercent);
            } else {
              // Looks like heavy reading is over.
              // Just exit form this mode.
              heavyEvictionCount = 0;
              cache.cacheDataBlockPercent = 100;
            }
          }
          LOG.info("BlockCache evicted (MB): {}, overhead (%): {}, " +
            "heavy eviction counter: {}, " +
            "current caching DataBlock (%): {}",
            freedSumMb, freedDataOverheadPercent,
            heavyEvictionCount, cache.cacheDataBlockPercent);

          freedSumMb = 0;
          startTime = stopTime;
       }

Ngoku makhe sijonge kuyo yonke le nto sisebenzisa umzekelo wokwenyani. Sinescript sovavanyo silandelayo:

  1. Masiqale ukwenza iScan (25 imisonto, ibhetshi = 100)
  2. Emva kwemizuzu emi-5, yongeza izinto ezininzi (iintambo ezingama-25, ibhetshi = 100)
  3. Emva kwemizuzu emi-5, cima ukufumana okuninzi (kushiyeka iskena kuphela kwakhona)

Senza imitsi emibini, okokuqala hbase.lru.cache.heavy.eviction.count.limit = 10000 (eyenza ingasebenzi imbonakalo), kwaye emva koko usete umda = 0 (uyenza isebenze).

Kwiilogi ezingezantsi sibona indlela isici esivulwa ngayo kwaye sibuyisela i-Overshooting kwi-14-71%. Ngamaxesha ngamaxesha umthwalo uyancipha, ovula i-Backpressure kunye ne-HBase igcina iibhloko ezininzi kwakhona.

Log RegionServer
ikhutshiwe (MB): 0, umlinganiselo 0.0, ngaphezulu (%): -100, ikhawuntara yokugxotha enzima: 0, i-Caching yangoku ye-DataBlock (%): 100
ikhutshiwe (MB): 0, umlinganiselo 0.0, ngaphezulu (%): -100, ikhawuntara yokugxotha enzima: 0, i-Caching yangoku ye-DataBlock (%): 100
ikhutshiwe (MB): 2170, ratio 1.09, overhead (%): 985, counter eviction counter: 1, i-caching yangoku I-DataBlock (%): 91 < qala
ikhutshiwe (MB): 3763, ratio 1.08, overhead (%): 1781, i-counter eviction counter: 2, i-caching yangoku ye-DataBlock (%): 76
ikhutshiwe (MB): 3306, ratio 1.07, overhead (%): 1553, i-counter eviction counter: 3, i-caching yangoku ye-DataBlock (%): 61
ikhutshiwe (MB): 2508, ratio 1.06, overhead (%): 1154, i-counter eviction counter: 4, i-caching yangoku ye-DataBlock (%): 50
ikhutshiwe (MB): 1824, ratio 1.04, overhead (%): 812, i-counter eviction counter: 5, i-caching yangoku ye-DataBlock (%): 42
ikhutshiwe (MB): 1482, ratio 1.03, overhead (%): 641, i-counter eviction counter: 6, i-caching yangoku ye-DataBlock (%): 36
ikhutshiwe (MB): 1140, ratio 1.01, overhead (%): 470, i-counter eviction counter: 7, i-caching yangoku ye-DataBlock (%): 32
ikhutshiwe (MB): 913, ratio 1.0, overhead (%): 356, i-counter eviction counter: 8, i-caching yangoku ye-DataBlock (%): 29
ikhutshiwe (MB): 912, ratio 0.89, overhead (%): 356, i-counter eviction counter: 9, i-caching yangoku ye-DataBlock (%): 26
ikhutshiwe (MB): 684, ratio 0.76, overhead (%): 242, i-counter eviction counter: 10, i-caching yangoku ye-DataBlock (%): 24
ikhutshiwe (MB): 684, ratio 0.61, overhead (%): 242, i-counter eviction counter: 11, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 456, ratio 0.51, overhead (%): 128, i-counter eviction counter: 12, i-caching yangoku ye-DataBlock (%): 21
ikhutshiwe (MB): 456, ratio 0.42, overhead (%): 128, i-counter eviction counter: 13, i-caching yangoku ye-DataBlock (%): 20
ikhutshiwe (MB): 456, ratio 0.33, overhead (%): 128, i-counter eviction counter: 14, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 15, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 342, ratio 0.32, overhead (%): 71, i-counter eviction counter: 16, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 342, ratio 0.31, overhead (%): 71, i-counter eviction counter: 17, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.3, overhead (%): 14, i-counter eviction counter: 18, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.29, overhead (%): 14, i-counter eviction counter: 19, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.27, overhead (%): 14, i-counter eviction counter: 20, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.25, overhead (%): 14, i-counter eviction counter: 21, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.24, overhead (%): 14, i-counter eviction counter: 22, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.22, overhead (%): 14, i-counter eviction counter: 23, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.21, overhead (%): 14, i-counter eviction counter: 24, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.2, overhead (%): 14, i-counter eviction counter: 25, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 228, ratio 0.17, overhead (%): 14, i-counter eviction counter: 26, i-caching yangoku ye-DataBlock (%): 19
ukukhutshwa (MB): 456, umlinganiselo we-0.17, ngaphezulu (%): 128, i-counter eviction counter: 27, i-caching yangoku ye-DataBlock (%): 18 < idityanisiwe ifumana (kodwa itheyibhile efanayo)
ikhutshiwe (MB): 456, ratio 0.15, overhead (%): 128, i-counter eviction counter: 28, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 342, ratio 0.13, overhead (%): 71, i-counter eviction counter: 29, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 342, ratio 0.11, overhead (%): 71, i-counter eviction counter: 30, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 342, ratio 0.09, overhead (%): 71, i-counter eviction counter: 31, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 228, ratio 0.08, overhead (%): 14, i-counter eviction counter: 32, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 228, ratio 0.07, overhead (%): 14, i-counter eviction counter: 33, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 228, ratio 0.06, overhead (%): 14, i-counter eviction counter: 34, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 228, ratio 0.05, overhead (%): 14, i-counter eviction counter: 35, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 228, ratio 0.05, overhead (%): 14, i-counter eviction counter: 36, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 228, ratio 0.04, overhead (%): 14, i-counter eviction counter: 37, i-caching yangoku ye-DataBlock (%): 17
ukukhutshwa (MB): 109, umlinganiselo 0.04, ngaphezulu (%): -46, i-counter eviction counter: 37, i-caching yangoku ye-DataBlock (%): 22 < umva woxinzelelo
ikhutshiwe (MB): 798, ratio 0.24, overhead (%): 299, i-counter eviction counter: 38, i-caching yangoku ye-DataBlock (%): 20
ikhutshiwe (MB): 798, ratio 0.29, overhead (%): 299, i-counter eviction counter: 39, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 570, ratio 0.27, overhead (%): 185, i-counter eviction counter: 40, i-caching yangoku ye-DataBlock (%): 17
ikhutshiwe (MB): 456, ratio 0.22, overhead (%): 128, i-counter eviction counter: 41, i-caching yangoku ye-DataBlock (%): 16
ikhutshiwe (MB): 342, ratio 0.16, overhead (%): 71, i-counter eviction counter: 42, i-caching yangoku ye-DataBlock (%): 16
ikhutshiwe (MB): 342, ratio 0.11, overhead (%): 71, i-counter eviction counter: 43, i-caching yangoku ye-DataBlock (%): 16
ikhutshiwe (MB): 228, ratio 0.09, overhead (%): 14, i-counter eviction counter: 44, i-caching yangoku ye-DataBlock (%): 16
ikhutshiwe (MB): 228, ratio 0.07, overhead (%): 14, i-counter eviction counter: 45, i-caching yangoku ye-DataBlock (%): 16
ikhutshiwe (MB): 228, ratio 0.05, overhead (%): 14, i-counter eviction counter: 46, i-caching yangoku ye-DataBlock (%): 16
ikhutshiwe (MB): 222, ratio 0.04, overhead (%): 11, i-counter eviction counter: 47, i-caching yangoku ye-DataBlock (%): 16
ukukhutshwa (MB): 104, umlinganiselo 0.03, ngaphezulu (%): -48, i-counter eviction counter: 47, i-caching yangoku ye-DataBlock (%): 21 < ukuphazamisa ufumana
ikhutshiwe (MB): 684, ratio 0.2, overhead (%): 242, i-counter eviction counter: 48, i-caching yangoku ye-DataBlock (%): 19
ikhutshiwe (MB): 570, ratio 0.23, overhead (%): 185, i-counter eviction counter: 49, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 342, ratio 0.22, overhead (%): 71, i-counter eviction counter: 50, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 228, ratio 0.21, overhead (%): 14, i-counter eviction counter: 51, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 228, ratio 0.2, overhead (%): 14, i-counter eviction counter: 52, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 228, ratio 0.18, overhead (%): 14, i-counter eviction counter: 53, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 228, ratio 0.16, overhead (%): 14, i-counter eviction counter: 54, i-caching yangoku ye-DataBlock (%): 18
ikhutshiwe (MB): 228, ratio 0.14, overhead (%): 14, i-counter eviction counter: 55, i-caching yangoku ye-DataBlock (%): 18
ukukhutshwa (MB): 112, umlinganiselo 0.14, ngaphezulu (%): -44, i-counter eviction counter: 55, i-caching yangoku ye-DataBlock (%): 23 < umva woxinzelelo
ikhutshiwe (MB): 456, ratio 0.26, overhead (%): 128, i-counter eviction counter: 56, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.31, overhead (%): 71, i-counter eviction counter: 57, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 58, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 59, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 60, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 61, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 62, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 63, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.32, overhead (%): 71, i-counter eviction counter: 64, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 65, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 66, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.32, overhead (%): 71, i-counter eviction counter: 67, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 68, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.32, overhead (%): 71, i-counter eviction counter: 69, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.32, overhead (%): 71, i-counter eviction counter: 70, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 71, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 72, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 73, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 74, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 75, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 342, ratio 0.33, overhead (%): 71, i-counter eviction counter: 76, i-caching yangoku ye-DataBlock (%): 22
ikhutshiwe (MB): 21, umlinganiselo 0.33, ngaphezulu (%): -90, ikhawuntara yokugxotha enzima: 76, i-Caching yangoku ye-DataBlock (%): 32
ikhutshiwe (MB): 0, umlinganiselo 0.0, ngaphezulu (%): -100, ikhawuntara yokugxotha enzima: 0, i-Caching yangoku ye-DataBlock (%): 100
ikhutshiwe (MB): 0, umlinganiselo 0.0, ngaphezulu (%): -100, ikhawuntara yokugxotha enzima: 0, i-Caching yangoku ye-DataBlock (%): 100

Izikena zazifuneka ukubonisa inkqubo efanayo ngendlela yegrafu yobudlelwane phakathi kwamacandelo amabini e-cache - enye (apho iibhloko ezingazange zicelwe ngaphambili) kunye neninzi (idatha "eceliwe" ubuncinane kanye igcinwe apha):

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Kwaye okokugqibela, ingaba ukusebenza kweeparamitha kujongeka njani ngendlela yegrafu. Ukuthelekisa, i-cache yacinywa ngokupheleleyo ekuqaleni, emva koko i-HBase yasungulwa nge-caching kunye nokulibazisa ukuqala komsebenzi wokuphucula ngemizuzu ye-5 (imijikelezo ye-30 yokukhupha).

Ikhowudi epheleleyo inokufumaneka kwisicelo sokutsalwa HBASE 23887 kwi github.

Nangona kunjalo, ama-300 amawaka afundwa ngesekhondi ayikho yonke into enokufezekiswa kule hardware phantsi kwezi meko. Inyaniso kukuba xa ufuna ukufikelela kwidatha nge-HDFS, i-ShortCircuitCache (emva koku ebizwa ngokuba yi-SSC) isetyenziswa, ekuvumela ukuba ufikelele kwidatha ngokuthe ngqo, ugweme ukusebenzisana kwenethiwekhi.

Iprofayili ibonise ukuba nangona le ndlela inika inzuzo enkulu, iphinda ibe ngumqobo, kuba phantse yonke imisebenzi enzima eyenzekayo ngaphakathi kwesitshixo, nto leyo ekhokelela ekuthinteleni ixesha elininzi.

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Sithe sakuqonda oku, siye safumanisa ukuba ingxaki inokuthintelwa ngokwenza uluhlu lwee-SSC ezizimeleyo:

private final ShortCircuitCache[] shortCircuitCache;
...
shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
for (int i = 0; i < this.clientShortCircuitNum; i++)
  this.shortCircuitCache[i] = new ShortCircuitCache(…);

Kwaye ke usebenze nabo, ungabandakanyi iindlela zokuhlangana kwakhona kwidijithi yokugqibela ye-offset:

public ShortCircuitCache getShortCircuitCache(long idx) {
    return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}

Ngoku ungaqala ukuvavanya. Ukwenza oku, siya kufunda iifayile kwiHDFS ngesicelo esilula semisonto emininzi. Seta iiparamitha:

conf.set("dfs.client.read.shortcircuit", "true");
conf.set("dfs.client.read.shortcircuit.buffer.size", "65536"); // ΠΏΠΎ Π΄Π΅Ρ„ΠΎΠ»Ρ‚Ρƒ = 1 ΠœΠ‘ ΠΈ это сильно замСдляСт Ρ‡Ρ‚Π΅Π½ΠΈΠ΅, поэтому Π»ΡƒΡ‡ΡˆΠ΅ привСсти Π² соотвСтствиС ΠΊ Ρ€Π΅Π°Π»ΡŒΠ½Ρ‹ΠΌ Π½ΡƒΠΆΠ΄Π°ΠΌ
conf.set("dfs.client.short.circuit.num", num); // ΠΎΡ‚ 1 Π΄ΠΎ 10

Kwaye funda nje iifayile:

FSDataInputStream in = fileSystem.open(path);
for (int i = 0; i < count; i++) {
    position += 65536;
    if (position > 900000000)
        position = 0L;
    int res = in.read(position, byteBuffer, 0, 65536);
}

Le khowudi iqhutywe kwimicu eyahlukileyo kwaye siya kwandisa inani leefayile ezifundwayo ngaxeshanye (ukusuka kwi-10 ukuya kwi-200 - i-axis ethe tyaba) kunye nenani le-cache (ukusuka kwi-1 ukuya kwi-10 - imizobo). I-axis ethe nkqo ibonisa ukukhawuleza okubangelwa ukunyuka kwe-SSC ngokumalunga nemeko xa kukho i-cache enye kuphela.

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Indlela yokufunda igrafu: Ixesha lokwenziwa kwe-100 lamawaka lifundwa kwiibhloko ze-64 KB kunye ne-cache enye lifuna imizuzwana engama-78. Ngelixa nge-5 cache kuthatha imizuzwana eyi-16. Ezo. kukho ukukhawuleza kwe ~5 amaxesha. Njengoko kunokubonwa kwigrafu, umphumo awubonakali kakhulu kwinani elincinci lokufundwa okufanayo; iqala ukudlala indima ebonakalayo xa kukho intambo engaphezulu kwe-50. Kwakhona kubonakala ukuba ukwandisa inani le-SSC ukusuka kwi-6 kwaye ngaphezulu inika ukonyuka okuncinci kokusebenza okuncinci.

Qaphela 1: kuba iziphumo zovavanyo ziguquguquka kakhulu (jonga ngezantsi), ii-run ezi-3 zenziwe kwaye amaxabiso afunyenweyo aphakathi.

Qaphela 2: Inzuzo yokusebenza ekulungiseleleni ukufikelela okungahleliweyo iyafana, nangona ufikelelo ngokwalo lucotha kancinane.

Nangona kunjalo, kuyimfuneko ukucacisa ukuba, ngokungafaniyo nemeko ye-HBase, oku kukhawuleziswa akusoloko kukhululekile. Apha "sivula" amandla e-CPU okwenza umsebenzi ngakumbi, endaweni yokuxhoma kwizitshixo.

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Apha ungajonga ukuba, ngokubanzi, ukonyuka kwenani leecache kunika malunga nokonyuka ngokomlinganiselo kusetyenziso lwe-CPU. Nangona kunjalo, kukho imidibaniso ephumeleleyo kancinane.

Ngokomzekelo, makhe sihlolisise ngakumbi ukusetha i-SSC = 3. Ukunyuka kwentsebenzo kuluhlu malunga namaxesha angama-3.3. Apha ngezantsi kukho iziphumo zayo yomithathu imitsi eyahlukeneyo.

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Ngoxa ukusetyenziswa kwe-CPU kwandisa malunga namaxesha e-2.8. Umahluko awumkhulu kakhulu, kodwa uGreta omncinci sele onwabile kwaye unokuba nexesha lokuya esikolweni kunye nezifundo.

Ngaloo ndlela, oku kuya kuba nesiphumo esihle kuso nasiphi na isixhobo esisebenzisa ukufikelela ngobuninzi kwi-HDFS (umzekelo i-Spark, njl.), ngaphandle kokuba ikhowudi yesicelo ilula (okt iplagi ikwicala lomxhasi we-HDFS) kwaye kukho amandla e-CPU asimahla. . Ukujonga, makhe sivavanye isiphumo esidityanisiweyo sokusetyenziswa kweBlockCache kunye nokulungiswa kwe-SSC yokufunda kwi-HBase kuya kuba nayo.

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Inokubonwa ukuba phantsi kweemeko ezinjalo umphumo awukho mkhulu njengeemvavanyo ezisulungekileyo (ukufunda ngaphandle kokuqhubela phambili), kodwa kunokwenzeka ukuba ucinezele i-80K eyongezelelweyo apha. Ngokudibeneyo, zombini ulungiselelo lubonelela ngesantya se-4x.

I-PR yenziwe kwakhona kolu lungiselelo [HDFS-15202], ethe yadityaniswa kwaye oku kusebenza kuya kufumaneka kukhupho oluzayo.

Kwaye ekugqibeleni, bekunomdla ukuthelekisa ukusebenza kokufunda kwedatha yekholamu ebanzi efanayo, iCassandra kunye neHBase.

Ukwenza oku, siye sasungula imizekelo yomgangatho wovavanyo lomthwalo we-YCSB osetyenziswayo kwimikhosi emibini (i-800 imisonto iyonke). Kwicala lomncedisi - imizekelo ye-4 ye-RegionServer kunye neCassandra kwi-4 imikhosi (kungekhona apho abathengi baqhuba khona, ukuphepha impembelelo yabo). Ufundo lwavela kwiitheyibhile zobukhulu:

I-HBase - 300 GB kwi-HDFS (i-100 GB yedatha ecocekileyo)

Cassandra - 250 GB (into yokuphindaphinda = 3)

Ezo. umthamo wawuphantse ufane (kwi-HBase kancinane ngakumbi).

Imilinganiselo ye-HBase:

dfs.client.short.circuit.num = 5 (Ufezekiso lomxumi weHDFS)

hbase.lru.cache.heavy.eviction.count.limit = 30 - oku kuthetha ukuba i-patch iya kuqalisa ukusebenza emva kokukhutshwa kwe-30 (~ imizuzu emi-5)

hbase.lru.cache.heavy.eviction.mb.size.limit = 300 - umthamo ojoliswe kuwo we-caching kunye nokukhutshwa

Iilog ze-YCSB zacalulwa zaza zadityaniswa kwiigrafu ze-Excel:

Indlela yokunyusa isantya sokufunda ukusuka kwi-HBase ukuya kumaxesha e-3 kwaye ukusuka kwi-HDFS ukuya kuma-5 amaxesha

Njengoko ubona, oku kulungiswa kwenza kube lula ukuthelekisa ukusebenza kwezi nkcukacha zolwazi phantsi kwezi meko kwaye ufezekise i-450 yewaka lokufunda ngesibini.

Siyathemba ukuba olu lwazi lunokuba luncedo kumntu ngexesha lomzabalazo onomdla wokuvelisa.

umthombo: www.habr.com

Yongeza izimvo