Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Ukusebenza okuphezulu kungenye yezidingo ezibalulekile lapho usebenza nedatha enkulu. Emnyangweni wokulayishwa kwedatha e-Sberbank, simpompa cishe yonke imisebenzi ku-Hadoop-based Data Cloud yethu futhi ngenxa yalokho sibhekana nokugeleza okukhulu kolwazi. Ngokwemvelo, sihlala sifuna izindlela zokuthuthukisa ukusebenza, futhi manje sifuna ukukutshela ukuthi sikwazile kanjani ukunamathisela i-RegionServer HBase kanye neklayenti le-HDFS, ngenxa yokuthi sikwazile ukukhulisa kakhulu isivinini sokusebenza kokufunda.
Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Kodwa-ke, ngaphambi kokudlulela engqikithini yokuthuthukiswa, kufanelekile ukukhuluma ngemikhawulo, empeleni, ayikwazi ukuvinjelwa uma uhlala ku-HDD.

Kungani i-HDD nokufundwa kokufinyelela okungahleliwe kungahambisani
Njengoba wazi, i-HBase, nezinye izizindalwazi eziningi, gcina idatha ngamabhulokhi angamashumi amaningana amakhilobhayithi ngosayizi. Ngokuzenzakalelayo icishe ibe ngu-64 KB. Manje ake sicabange ukuthi sidinga ukuthola amabhayithi ayi-100 kuphela futhi sicela i-HBase ukuthi isinike le datha isebenzisa ukhiye othile. Njengoba usayizi webhulokhi ku-HFiles ingu-64 KB, isicelo sizoba sikhulu ngokuphindwe ka-640 (umzuzu nje!) kunesidingo.

Okulandelayo, njengoba isicelo sizodlula ku-HDFS kanye nendlela yayo yokugcina isikhashana imethadatha I-ShortCircuitCache (okuvumela ukufinyelela okuqondile kumafayela), lokhu kuholela ekufundeni kakade i-1 MB kudiski. Nokho, lokhu kungalungiswa ngepharamitha dfs.client.read.shortcircuit.buffer.size futhi ezimweni eziningi kunengqondo ukunciphisa leli nani, isibonelo ku-126 KB.

Ake sithi senza lokhu, kodwa ngaphezu kwalokho, lapho siqala ukufunda idatha nge-java api, njengemisebenzi efana neFayileChannel.read futhi ucele uhlelo lokusebenza ukuthi lufunde inani elishiwo ledatha, lifundeka ngokuthi "uma kwenzeka" izikhathi ezingu-2 ngaphezulu. , i.e. 256 KB esimweni sethu. Lokhu kungenxa yokuthi i-java ayinayo indlela elula yokusetha ifulegi le-FADV_RANDOM ukuvimbela lokhu kuziphatha.

Ngenxa yalokho, ukuthola amabhayithi ethu angu-100, izikhathi ezingu-2600 ngaphezulu zifundwa ngaphansi kwe-hood. Kungase kubonakale sengathi isisombululo sisobala, ake sehlise usayizi webhulokhi ube yikhilobhayithi, setha ifulege elishiwo futhi sizuze ukusheshisa okukhulu kokukhanyiselwa. Kodwa inkinga ukuthi ngokunciphisa usayizi webhulokhi izikhathi ezi-2, siphinde sinciphise inani lamabhayithi afundwa iyunithi ngayinye yesikhathi izikhathi ezi-2.

Okunye ukuzuza ngokusetha ifulegi le-FADV_RADOM kungatholwa, kodwa kuphela ngokuhlanganisa okuningi okuphakeme kanye nosayizi webhulokhi ongu-128 KB, kodwa lokhu umkhawulo wamashumi ambalwa wamaphesenti:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Ukuhlolwa kwenziwa kumafayela ayi-100, ngalinye lingu-1 GB ngosayizi futhi atholakala kuma-HDD ayi-10.

Ake sibale lokho esingathembela kukho, ngokomthetho, ngalesi sivinini:
Ake sithi sifunda kusuka kumadiski ayi-10 ngesivinini esingu-280 MB/sec, i.e. 3 million izikhathi 100 byte. Kodwa njengoba sikhumbula, idatha esiyidingayo ingaphansi izikhathi ezingu-2600 kunaleyo efundiwe. Ngakho, sihlukanisa izigidi ezi-3 ngo-2600 futhi sithole Amarekhodi ayi-1100 ngomzuzwana.

Kuyadabukisa, akunjalo? Imvelo leyo Ukufinyelela okungahleliwe ukufinyelela idatha ku-HDD - kungakhathaliseki ukuthi usayizi we-block. Lona umkhawulo ongokoqobo wokufinyelela okungahleliwe futhi akukho sizindalwazi esingashutheka ngaphezulu ngaphansi kwezimo ezinjalo.

Ngabe imininingwane yolwazi ifinyelela kanjani isivinini esikhulu kakhulu? Ukuze siphendule lo mbuzo, ake sibheke ukuthi kwenzekani esithombeni esilandelayo:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Lapha sibona ukuthi emizuzwini embalwa yokuqala ijubane lingaba amarekhodi ayinkulungwane ngomzuzwana. Kodwa-ke, ngokuqhubekayo, ngenxa yokuthi kuningi okufundwayo kunalokho okuceliwe, idatha iphelela ku-buff/cache yesistimu yokusebenza (linux) futhi isivinini sikhuphuka sibe yizinkulungwane ezingama-60 ngomzuzwana.

Ngakho, ngokuqhubekayo sizobhekana nokusheshisa ukufinyelela kuphela kudatha esenqolobaneni ye-OS noma etholakala kumadivayisi esitoreji se-SSD/NVMe anesivinini sokufinyelela esiqhathanisekayo.

Esimweni sethu, sizoqhuba izivivinyo ebhentshini lamaseva angu-4, ngalinye likhokhiswa ngale ndlela elandelayo:

CPU: Xeon E5-2680 v4 @ 2.40GHz 64 imicu.
Imemori: 730 GB.
inguqulo ye-java: 1.8.0_111

Futhi lapha iphuzu eliyinhloko yinani ledatha emathebula okudingeka ifundwe. Iqiniso liwukuthi uma ufunda idatha kusuka kuthebula elibekwe ngokuphelele kunqolobane ye-HBase, ngeke ize ngisho ifunde kusuka ku-buff/cache yesistimu yokusebenza. Ngoba i-HBase ngokuzenzakalelayo yabela inkumbulo engu-40% esakhiweni esibizwa ngokuthi i-BlockCache. Empeleni lena i-ConcurrentHashMap, lapho ukhiye kuyigama lefayela + offset yebhulokhi, futhi inani liyidatha yangempela kulokhu kususwa.

Ngakho, lapho sifunda kuphela kulesi sakhiwo, thina sibona isivinini esihle kakhulu, njengezicelo eziyisigidi ngomzuzwana. Kodwa ake sicabange ukuthi asikwazi ukwaba amakhulukhulu amagigabhayithi enkumbulo nje ngezidingo zesizindalwazi, ngoba ziningi ezinye izinto eziwusizo ezisebenza kulawa maseva.

Isibonelo, esimweni sethu, ivolumu ye-BlockCache ku-RS eyodwa imayelana ne-12 GB. Safika ama-RS amabili endaweni eyodwa, okungukuthi. I-96 GB yabelwe i-BlockCache kuwo wonke ama-node. Futhi kunedatha ephindwe kaningi, isibonelo, makube amatafula angu-4, izifunda ezingu-130 ngasinye, lapho amafayela angu-800 MB ngosayizi, acindezelwe yi-FAST_DIFF, i.e. isamba esingu-410 GB (lokhu idatha emsulwa, okungukuthi ngaphandle kokucabangela isici sokuphindaphinda).

Ngakho, i-BlockCache imayelana ne-23% kuphela yevolumu yedatha ephelele futhi lokhu kuseduze kakhulu nezimo zangempela zalokho okubizwa ngokuthi i-BigData. Futhi yilapho ubumnandi buqala khona - ngoba ngokusobala, ukushaya kwenqolobane okumbalwa, kubi kakhulu ukusebenza. Phela, uma uphuthelwa, kuzodingeka wenze umsebenzi omningi - i.e. yehla ukuze ushayele imisebenzi yesistimu. Kodwa-ke, lokhu akunakugwenywa, ngakho-ke ake sibheke isici esihluke ngokuphelele - kwenzekani kudatha engaphakathi kwenqolobane?

Asenze isimo sibe lula futhi sicabange ukuthi sinenqolobane elingana into engu-1 kuphela. Nasi isibonelo salokho okuzokwenzeka uma sizama ukusebenza ngevolumu yedatha enkulu izikhathi ezi-3 kunenqolobane, kuzodingeka ukuthi:

1. Faka ibhulokhi 1 kunqolobane
2. Susa ibhulokhi 1 kunqolobane
3. Faka ibhulokhi 2 kunqolobane
4. Susa ibhulokhi 2 kunqolobane
5. Faka ibhulokhi 3 kunqolobane

5 izenzo eziqediwe! Kodwa-ke, lesi simo asinakubizwa ngokuthi sijwayelekile; empeleni, siphoqa i-HBase ukuthi yenze inqwaba yomsebenzi ongenamsebenzi ngokuphelele. Ihlala ifunda idatha kusuka kunqolobane ye-OS, iyibeka ku-BlockCache, kuphela ukuyiphonsa cishe ngokushesha ngoba ingxenye entsha yedatha isifikile. Upopayi osekuqaleni kokuthunyelwe ubonisa ingqikithi yenkinga - Umqoqi Kadoti uya kancane, umkhathi uyashisa, u-Greta omncane eSweden ekude futhi eshisayo uyacasuka. Futhi thina bantu be-IT asikuthandi ngempela uma izingane zidabukile, ngakho siqala ukucabanga ukuthi yini esingayenza ngakho.

Kuthiwani uma ungabeki wonke amabhlogo kunqolobane, kodwa kuphela amaphesenti athile awo, ukuze inqolobane ingachichima? Ake siqale ngokungeza nje imigqa embalwa yekhodi ekuqaleni komsebenzi wokufaka idatha ku-BlockCache:

  public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) {
    if (cacheDataBlockPercent != 100 && buf.getBlockType().isData()) {
      if (cacheKey.getOffset() % 100 >= cacheDataBlockPercent) {
        return;
      }
    }
...

Iphuzu lapha lilandelayo: i-offset indawo yebhulokhi efayeleni futhi amadijithi ayo okugcina asakazwa ngokungahleliwe futhi ngokulinganayo ukusuka ku-00 kuya ku-99. Ngakho-ke, sizokweqa kuphela lezo eziwela ebangeni esilidingayo.

Isibonelo, setha i-cacheDataBlockPercent = 20 futhi ubone ukuthi kwenzekani:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Umphumela usobala. Kumagrafu angezansi, kuyacaca ukuthi kungani lokhu kusheshisa kwenzeke - silondoloza izinsiza eziningi ze-GC ngaphandle kokwenza umsebenzi we-Sisyphean wokufaka idatha kunqolobane kuphela ukuze siyilahle phansi ngokushesha izinja ze-Martian:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Ngasikhathi sinye, ukusetshenziswa kwe-CPU kuyanda, kepha kuncane kakhulu kunokukhiqiza:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Kuyaqapheleka futhi ukuthi amabhlogo agcinwe ku-BlockCache ahlukile. Iningi, cishe u-95%, idatha ngokwayo. Futhi okunye imethadatha, njengezihlungi ze-Bloom noma i-LEAF_INDEX kanye njll.. Le datha ayanele, kodwa iwusizo kakhulu, ngoba ngaphambi kokufinyelela idatha ngokuqondile, i-HBase iphendukela ku-meta ukuze iqonde ukuthi kudingekile yini ukusesha lapha futhi, uma kunjalo, lapho i-block of interest itholakala khona.

Ngakho-ke, kukhodi sibona isimo sokuhlola buf.getBlockType().isData() futhi ngenxa yale meta, sizoyishiya kunqolobane noma kunjalo.

Manje ake sikhulise umthwalo futhi siqinise kancane isici ngesikhathi esisodwa. Esivivinyweni sokuqala senze amaphesenti angu-cutoff = 20 futhi i-BlockCache yayingasetshenziswa kancane. Manje ake sikubeke ku-23% bese sengeza imicu eyi-100 njalo ngemizuzu emi-5 ukuze sibone ukuthi ukugcwaliswa kwe-saturation kwenzeka ngaliphi iphuzu:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Lapha sibona ukuthi inguqulo yokuqala icishe ifike ngokushesha ophahleni ngezicelo eziyizinkulungwane eziyi-100 ngomzuzwana. Nakuba i-patch inikeza ukusheshisa okufika ku-300 ayizinkulungwane. Ngasikhathi sinye, kuyacaca ukuthi ukusheshisa okwengeziwe akusekho β€œmahhala”; ukusetshenziswa kwe-CPU nakho kuyanda.

Kodwa-ke, lesi akusona isisombululo esihle kakhulu, njengoba asazi kusengaphambili ukuthi yimaphi amaphesenti amabhlogo adinga ukugcinwa kunqolobane, kuncike kuphrofayela yomthwalo. Ngakho-ke, kwasetshenziswa indlela yokulungisa ngokuzenzakalelayo le parameter kuye ngomsebenzi wokufunda.

Izinketho ezintathu zengeziwe ukulawula lokhu:

hbase.lru.cache.heavy.eviction.count.limit β€” isetha ukuthi inqubo yokukhipha idatha kunqolobane kufanele isebenze kangaki ngaphambi kokuba siqale ukusebenzisa ukulungiselelwa (okungukuthi, ukweqa amabhlogo). Ngokuzenzakalelayo ilingana no-MAX_INT = 2147483647 futhi empeleni kusho ukuthi isici ngeke siqale ukusebenza ngaleli nani. Ngoba inqubo yokuxoshwa iqala njalo ngemizuzwana emi-5 - 10 (kuncike emthwalweni) futhi 2147483647 * 10/60/60/24/365 = 680 iminyaka. Kodwa-ke, singasetha le pharamitha ibe ngu-0 futhi senze isici sisebenze ngokushesha ngemva kokwethulwa.

Nokho, kukhona futhi umthwalo okhokhelwayo kule parameter. Uma umthwalo wethu ungangoba ukufundwa kwesikhashana esifushane (isho emini) nokufunda kwesikhathi eside (ebusuku) kuhlale kuxhumene, lapho-ke singaqiniseka ukuthi isici sivulwa kuphela uma imisebenzi yokufunda isikhathi eside iqhubeka.

Isibonelo, siyazi ukuthi ukufunda kwesikhathi esifushane kuvame ukuthatha iminithi elingu-1. Asikho isidingo sokuqala ukulahla amabhlogo, i-cache ngeke ibe nesikhathi sokuphelelwa yisikhathi bese singasetha le parameter ilingane, isibonelo, 10. Lokhu kuzoholela eqinisweni lokuthi ukulungiselelwa kuzoqala ukusebenza kuphela uma isikhathi eside- ukufunda okusebenzayo kwethemu sekuqalile, i.e. ngemizuzwana eyi-100. Ngakho-ke, uma sinokufunda isikhathi esifushane, khona-ke wonke amabhlogo azongena kunqolobane futhi azotholakala (ngaphandle kwalawo azokhishwa yi-algorithm evamile). Futhi uma sifunda isikhathi eside, isici siyavulwa futhi sizoba nokusebenza okuphezulu kakhulu.

hbase.lru.cache.heavy.eviction.mb.size.limit - isetha ukuthi mangaki amamegabhayithi esingathanda ukuwabeka kunqolobane (futhi, vele, ikhiphe) kumasekhondi ayi-10. Isici sizozama ukufinyelela leli nani futhi siligcine. Iphuzu liwukuthi: uma sifaka ama-gigabytes ku-cache, kuzodingeka sikhiphe ama-gigabytes, futhi lokhu, njengoba sibonile ngenhla, kubiza kakhulu. Kodwa-ke, akufanele uzame ukusetha kube mncane kakhulu, njengoba lokhu kuzobangela ukuthi imodi yokweqa ibhulokhi iphume ngaphambi kwesikhathi. Kumaseva anamandla (cishe ama-cores angama-20-40), kulungile ukusetha cishe u-300-400 MB. Okwesigaba esiphakathi (~10 cores) 200-300 MB. Kuzinhlelo ezibuthakathaka (ama-2-5 cores) 50-100 MB kungenzeka kube okuvamile (akuhloliwe kulokhu).

Ake sibheke ukuthi lokhu kusebenza kanjani: ake sithi sibeka hbase.lru.cache.heavy.eviction.mb.size.limit = 500, kukhona uhlobo oluthile lomthwalo (ukufunda) bese kuthi njalo ngemizuzwana engu-10 sibala ukuthi mangaki amabhayithi akhishwe kusetshenziswa ifomula :

I-Overhead = Isamba Samabhayithi Akhululiwe (MB) * 100 / Umkhawulo (MB) - 100;

Uma empeleni u-2000 MB waxoshwa, i-Overhead ilingana nokuthi:

2000 * 100 / 500 - 100 = 300%

Ama-algorithms azama ukunakekela amaphesenti angadluli amashumi ambalwa, ngakho isici sizonciphisa iphesenti lamabhulokhi agcinwe kunqolobane, ngaleyo ndlela sisebenzisa indlela yokushuna ngokuzenzakalela.

Kodwa-ke, uma umthwalo wehla, ake sithi kuphela u-200 MB okhishiwe bese i-Overhead iba yinegethivu (lokho okubizwa ngokuthi ukudubula ngokweqile):

200 * 100 / 500 - 100 = -60%

Ngokuphambene nalokho, isici sizokwandisa iphesenti lamabhulokhi agcinwe kunqolobane kuze kube i-Overhead iba yinto enhle.

Ngezansi isibonelo sokuthi lokhu kubukeka kanjani kudatha yangempela. Asikho isidingo sokuzama ukufinyelela ku-0%, akunakwenzeka. Yinhle kakhulu uma icishe ibe ngu-30 - 100%, lokhu kusiza ukugwema ukuphuma ngaphambi kwesikhathi kumodi yokwenza kahle phakathi nokuhlinzwa kwesikhashana.

hbase.lru.cache.heavy.eviction.overhead.coefficient - ibeka ukuthi singathanda ngokushesha kangakanani ukuthola umphumela. Uma sazi ngokuqinisekile ukuthi ukufunda kwethu isikhathi esiningi eside futhi asifuni ukulinda, singase sinyuse lesi silinganiso futhi sithole ukusebenza okuphezulu ngokushesha.

Isibonelo, sibeka le coefficient = 0.01. Lokhu kusho ukuthi I-Overhead (bona ngenhla) izophindwaphindwa ngale nombolo ngomphumela futhi iphesenti lamabhulokhi afakwe kunqolobane azoncishiswa. Ake sicabange ukuthi I-Overhead = 300% kanye ne-coefficient = 0.01, khona-ke iphesenti lamabhulokhi agcinwe kunqolobane lizokwehliswa ngo-3%.

Umqondo ofanayo "we-Backpressure" nawo uyasetshenziswa kumavelu angaphansi kwe-Overhead (ukudubula ngokweqile). Njengoba ukushintshashintsha kwesikhathi esifushane kwevolumu yokufunda nokukhishwa kuhlala kungenzeka, le ndlela ikuvumela ukuthi ugweme ukuphuma ngaphambi kwesikhathi kumodi yokuthuthukisa. I-backpressure ine-logic ehlanekezelwe: lapho ukudubuleka kunamandla, amabhlogo engeziwe agcinwa kunqolobane.

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Ikhodi yokusebenzisa

        LruBlockCache cache = this.cache.get();
        if (cache == null) {
          break;
        }
        freedSumMb += cache.evict()/1024/1024;
        /*
        * Sometimes we are reading more data than can fit into BlockCache
        * and it is the cause a high rate of evictions.
        * This in turn leads to heavy Garbage Collector works.
        * So a lot of blocks put into BlockCache but never read,
        * but spending a lot of CPU resources.
        * Here we will analyze how many bytes were freed and decide
        * decide whether the time has come to reduce amount of caching blocks.
        * It help avoid put too many blocks into BlockCache
        * when evict() works very active and save CPU for other jobs.
        * More delails: https://issues.apache.org/jira/browse/HBASE-23887
        */

        // First of all we have to control how much time
        // has passed since previuos evict() was launched
        // This is should be almost the same time (+/- 10s)
        // because we get comparable volumes of freed bytes each time.
        // 10s because this is default period to run evict() (see above this.wait)
        long stopTime = System.currentTimeMillis();
        if ((stopTime - startTime) > 1000 * 10 - 1) {
          // Here we have to calc what situation we have got.
          // We have the limit "hbase.lru.cache.heavy.eviction.bytes.size.limit"
          // and can calculte overhead on it.
          // We will use this information to decide,
          // how to change percent of caching blocks.
          freedDataOverheadPercent =
            (int) (freedSumMb * 100 / cache.heavyEvictionMbSizeLimit) - 100;
          if (freedSumMb > cache.heavyEvictionMbSizeLimit) {
            // Now we are in the situation when we are above the limit
            // But maybe we are going to ignore it because it will end quite soon
            heavyEvictionCount++;
            if (heavyEvictionCount > cache.heavyEvictionCountLimit) {
              // It is going for a long time and we have to reduce of caching
              // blocks now. So we calculate here how many blocks we want to skip.
              // It depends on:
             // 1. Overhead - if overhead is big we could more aggressive
              // reducing amount of caching blocks.
              // 2. How fast we want to get the result. If we know that our
              // heavy reading for a long time, we don't want to wait and can
              // increase the coefficient and get good performance quite soon.
              // But if we don't sure we can do it slowly and it could prevent
              // premature exit from this mode. So, when the coefficient is
              // higher we can get better performance when heavy reading is stable.
              // But when reading is changing we can adjust to it and set
              // the coefficient to lower value.
              int change =
                (int) (freedDataOverheadPercent * cache.heavyEvictionOverheadCoefficient);
              // But practice shows that 15% of reducing is quite enough.
              // We are not greedy (it could lead to premature exit).
              change = Math.min(15, change);
              change = Math.max(0, change); // I think it will never happen but check for sure
              // So this is the key point, here we are reducing % of caching blocks
              cache.cacheDataBlockPercent -= change;
              // If we go down too deep we have to stop here, 1% any way should be.
              cache.cacheDataBlockPercent = Math.max(1, cache.cacheDataBlockPercent);
            }
          } else {
            // Well, we have got overshooting.
            // Mayby it is just short-term fluctuation and we can stay in this mode.
            // It help avoid permature exit during short-term fluctuation.
            // If overshooting less than 90%, we will try to increase the percent of
            // caching blocks and hope it is enough.
            if (freedSumMb >= cache.heavyEvictionMbSizeLimit * 0.1) {
              // Simple logic: more overshooting - more caching blocks (backpressure)
              int change = (int) (-freedDataOverheadPercent * 0.1 + 1);
              cache.cacheDataBlockPercent += change;
              // But it can't be more then 100%, so check it.
              cache.cacheDataBlockPercent = Math.min(100, cache.cacheDataBlockPercent);
            } else {
              // Looks like heavy reading is over.
              // Just exit form this mode.
              heavyEvictionCount = 0;
              cache.cacheDataBlockPercent = 100;
            }
          }
          LOG.info("BlockCache evicted (MB): {}, overhead (%): {}, " +
            "heavy eviction counter: {}, " +
            "current caching DataBlock (%): {}",
            freedSumMb, freedDataOverheadPercent,
            heavyEvictionCount, cache.cacheDataBlockPercent);

          freedSumMb = 0;
          startTime = stopTime;
       }

Manje ake sibheke konke lokhu sisebenzisa isibonelo sangempela. Sineskripthi sokuhlola esilandelayo:

  1. Ake siqale ukwenza Scan (imicu engama-25, iqoqo = 100)
  2. Ngemuva kwemizuzu emi-5, engeza ukuthola okuningi (imicu engama-25, iqoqo = 100)
  3. Ngemuva kwemizuzu emi-5, vala ukuthola okuningi (ukuskena kuphela kusele futhi)

Senza imigijimo emibili, okokuqala hbase.lru.cache.heavy.eviction.count.limit = 10000 (okuyinto empeleni ekhubaza isici), bese sibeka umkhawulo = 0 (okuyivumela).

Kumalogu angezansi sibona ukuthi isici sivulwa kanjani futhi sisetha kabusha i-Overshooting ibe ngu-14-71%. Ngezikhathi ezithile umthwalo uyancipha, okuvula i-Backpressure futhi i-HBase igcina amabhlogo amaningi futhi.

Ngena kwi-RegionServer
okhishiwe (MB): 0, isilinganiso 0.0, ngaphezulu (%): -100, isibali sokukhipha esindayo: 0, i-Caching yamanje ye-DataBlock (%): 100
okhishiwe (MB): 0, isilinganiso 0.0, ngaphezulu (%): -100, isibali sokukhipha esindayo: 0, i-Caching yamanje ye-DataBlock (%): 100
okhishiwe (MB): 2170, isilinganiso 1.09, ngaphezulu (%): 985, indawo esindayo yokukhipha abantu: 1, i-Caching yamanje ye-DataBlock (%): 91 < start
okhishiwe (MB): 3763, isilinganiso 1.08, ngaphezulu (%): 1781, indawo esindayo yokukhipha abantu: 2, i-Caching yamanje ye-DataBlock (%): 76
okhishiwe (MB): 3306, isilinganiso 1.07, ngaphezulu (%): 1553, indawo esindayo yokukhipha abantu: 3, i-Caching yamanje ye-DataBlock (%): 61
okhishiwe (MB): 2508, isilinganiso 1.06, ngaphezulu (%): 1154, indawo esindayo yokukhipha abantu: 4, i-Caching yamanje ye-DataBlock (%): 50
okhishiwe (MB): 1824, isilinganiso 1.04, ngaphezulu (%): 812, indawo esindayo yokukhipha abantu: 5, i-Caching yamanje ye-DataBlock (%): 42
okhishiwe (MB): 1482, isilinganiso 1.03, ngaphezulu (%): 641, indawo esindayo yokukhipha abantu: 6, i-Caching yamanje ye-DataBlock (%): 36
okhishiwe (MB): 1140, isilinganiso 1.01, ngaphezulu (%): 470, indawo esindayo yokukhipha abantu: 7, i-Caching yamanje ye-DataBlock (%): 32
okhishiwe (MB): 913, isilinganiso 1.0, ngaphezulu (%): 356, indawo esindayo yokukhipha abantu: 8, i-Caching yamanje ye-DataBlock (%): 29
okhishiwe (MB): 912, isilinganiso 0.89, ngaphezulu (%): 356, indawo esindayo yokukhipha abantu: 9, i-Caching yamanje ye-DataBlock (%): 26
okhishiwe (MB): 684, isilinganiso 0.76, ngaphezulu (%): 242, indawo esindayo yokukhipha abantu: 10, i-Caching yamanje ye-DataBlock (%): 24
okhishiwe (MB): 684, isilinganiso 0.61, ngaphezulu (%): 242, indawo esindayo yokukhipha abantu: 11, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 456, isilinganiso 0.51, ngaphezulu (%): 128, indawo esindayo yokukhipha abantu: 12, i-Caching yamanje ye-DataBlock (%): 21
okhishiwe (MB): 456, isilinganiso 0.42, ngaphezulu (%): 128, indawo esindayo yokukhipha abantu: 13, i-Caching yamanje ye-DataBlock (%): 20
okhishiwe (MB): 456, isilinganiso 0.33, ngaphezulu (%): 128, indawo esindayo yokukhipha abantu: 14, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 15, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 342, isilinganiso 0.32, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 16, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 342, isilinganiso 0.31, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 17, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.3, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 18, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.29, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 19, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.27, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 20, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.25, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 21, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.24, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 22, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.22, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 23, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.21, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 24, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.2, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 25, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 228, isilinganiso 0.17, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 26, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 456, isilinganiso 0.17, ngaphezulu (%): 128, ikhawunta yokukhishwa esindayo: 27, inqolobane yamanje I-DataBlock (%): 18 < ingeziwe ithola (kodwa ithebula elifanayo)
okhishiwe (MB): 456, isilinganiso 0.15, ngaphezulu (%): 128, indawo esindayo yokukhipha abantu: 28, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 342, isilinganiso 0.13, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 29, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 342, isilinganiso 0.11, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 30, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 342, isilinganiso 0.09, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 31, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 228, isilinganiso 0.08, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 32, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 228, isilinganiso 0.07, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 33, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 228, isilinganiso 0.06, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 34, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 228, isilinganiso 0.05, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 35, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 228, isilinganiso 0.05, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 36, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 228, isilinganiso 0.04, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 37, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 109, isilinganiso 0.04, ngaphezulu (%): -46, ikhawunta yokukhishwa esindayo: 37, inqolobane yamanje I-DataBlock (%): 22 < ingcindezi yangemuva
okhishiwe (MB): 798, isilinganiso 0.24, ngaphezulu (%): 299, indawo esindayo yokukhipha abantu: 38, i-Caching yamanje ye-DataBlock (%): 20
okhishiwe (MB): 798, isilinganiso 0.29, ngaphezulu (%): 299, indawo esindayo yokukhipha abantu: 39, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 570, isilinganiso 0.27, ngaphezulu (%): 185, indawo esindayo yokukhipha abantu: 40, i-Caching yamanje ye-DataBlock (%): 17
okhishiwe (MB): 456, isilinganiso 0.22, ngaphezulu (%): 128, indawo esindayo yokukhipha abantu: 41, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 342, isilinganiso 0.16, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 42, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 342, isilinganiso 0.11, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 43, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 228, isilinganiso 0.09, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 44, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 228, isilinganiso 0.07, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 45, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 228, isilinganiso 0.05, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 46, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 222, isilinganiso 0.04, ngaphezulu (%): 11, indawo esindayo yokukhipha abantu: 47, i-Caching yamanje ye-DataBlock (%): 16
okhishiwe (MB): 104, isilinganiso 0.03, ngaphezulu (%): -48, indawo esindayo yokukhipha: 47, inqolobane yamanje I-DataBlock (%): 21 < ukuphazamisa uthola
okhishiwe (MB): 684, isilinganiso 0.2, ngaphezulu (%): 242, indawo esindayo yokukhipha abantu: 48, i-Caching yamanje ye-DataBlock (%): 19
okhishiwe (MB): 570, isilinganiso 0.23, ngaphezulu (%): 185, indawo esindayo yokukhipha abantu: 49, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 342, isilinganiso 0.22, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 50, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 228, isilinganiso 0.21, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 51, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 228, isilinganiso 0.2, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 52, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 228, isilinganiso 0.18, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 53, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 228, isilinganiso 0.16, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 54, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 228, isilinganiso 0.14, ngaphezulu (%): 14, indawo esindayo yokukhipha abantu: 55, i-Caching yamanje ye-DataBlock (%): 18
okhishiwe (MB): 112, isilinganiso 0.14, ngaphezulu (%): -44, ikhawunta yokukhishwa esindayo: 55, inqolobane yamanje I-DataBlock (%): 23 < ingcindezi yangemuva
okhishiwe (MB): 456, isilinganiso 0.26, ngaphezulu (%): 128, indawo esindayo yokukhipha abantu: 56, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.31, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 57, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 58, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 59, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 60, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 61, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 62, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 63, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.32, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 64, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 65, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 66, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.32, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 67, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 68, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.32, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 69, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.32, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 70, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 71, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 72, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 73, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 74, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 75, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 342, isilinganiso 0.33, ngaphezulu (%): 71, indawo esindayo yokukhipha abantu: 76, i-Caching yamanje ye-DataBlock (%): 22
okhishiwe (MB): 21, isilinganiso 0.33, ngaphezulu (%): -90, isibali sokukhipha esindayo: 76, i-Caching yamanje ye-DataBlock (%): 32
okhishiwe (MB): 0, isilinganiso 0.0, ngaphezulu (%): -100, isibali sokukhipha esindayo: 0, i-Caching yamanje ye-DataBlock (%): 100
okhishiwe (MB): 0, isilinganiso 0.0, ngaphezulu (%): -100, isibali sokukhipha esindayo: 0, i-Caching yamanje ye-DataBlock (%): 100

Izikena bezidingeka ukuze kuboniswe inqubo efanayo ngendlela yegrafu yobudlelwano phakathi kwezigaba ezimbili zenqolobane - eyodwa (lapho amabhulokhi angakaze acelwe ngaphambili) kanye nokuningi (idatha "eceliwe" okungenani kanye igcinwa lapha):

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Futhi ekugcineni, ukusebenza kwamapharamitha kubukeka kanjani ngendlela yegrafu. Uma kuqhathaniswa, inqolobane yavalwa ngokuphelele ekuqaleni, kwase kuthi i-HBase yethulwa ngokulondoloza isikhashana futhi yabambezeleka ukuqala komsebenzi wokuthuthukisa ngemizuzu emi-5 (imijikelezo engu-30 yokuxoshwa).

Ikhodi egcwele ingatholakala kokuthi Isicelo Sokudonsa I-HBASE 23887 ku github.

Kodwa-ke, ukufundwa kwezinkulungwane ezingama-300 ngomzuzwana akuyona yonke into engafinyelelwa kule hardware ngaphansi kwalezi zimo. Iqiniso liwukuthi uma udinga ukufinyelela idatha nge-HDFS, kusetshenziswa indlela ye-ShortCircuitCache (ebizwa ngokuthi yi-SSC), evumela ukuthi ufinyelele idatha ngokuqondile, ugweme ukusebenzisana kwenethiwekhi.

Iphrofayili yabonisa ukuthi nakuba lo mshini unikeza inzuzo enkulu, futhi ngesinye isikhathi uba ibhodlela, ngoba cishe yonke imisebenzi enzima ivela ngaphakathi kwengidi, okuholela ekuvimbeni isikhathi esiningi.

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Sesikubonile lokhu, sabona ukuthi inkinga ingagwenywa ngokwakha uxhaxha lwama-SSC azimele:

private final ShortCircuitCache[] shortCircuitCache;
...
shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
for (int i = 0; i < this.clientShortCircuitNum; i++)
  this.shortCircuitCache[i] = new ShortCircuitCache(…);

Bese usebenzisana nabo, ungafaki ukuphambana kwemigwaqo kanye nedijithi yokugcina ye-offset:

public ShortCircuitCache getShortCircuitCache(long idx) {
    return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
}

Manje ungaqala ukuhlola. Ukuze senze lokhu, sizofunda amafayela avela ku-HDFS ngohlelo lokusebenza olulula olunemicu eminingi. Setha amapharamitha:

conf.set("dfs.client.read.shortcircuit", "true");
conf.set("dfs.client.read.shortcircuit.buffer.size", "65536"); // ΠΏΠΎ Π΄Π΅Ρ„ΠΎΠ»Ρ‚Ρƒ = 1 ΠœΠ‘ ΠΈ это сильно замСдляСт Ρ‡Ρ‚Π΅Π½ΠΈΠ΅, поэтому Π»ΡƒΡ‡ΡˆΠ΅ привСсти Π² соотвСтствиС ΠΊ Ρ€Π΅Π°Π»ΡŒΠ½Ρ‹ΠΌ Π½ΡƒΠΆΠ΄Π°ΠΌ
conf.set("dfs.client.short.circuit.num", num); // ΠΎΡ‚ 1 Π΄ΠΎ 10

Futhi vele ufunde amafayela:

FSDataInputStream in = fileSystem.open(path);
for (int i = 0; i < count; i++) {
    position += 65536;
    if (position > 900000000)
        position = 0L;
    int res = in.read(position, byteBuffer, 0, 65536);
}

Le khodi isetshenziswa emiculweni ehlukene futhi sizokwandisa inani lamafayela afundwa kanyekanye (kusuka ku-10 kuye ku-200 - i-axis enezingqimba) kanye nenani lama-caches (kusuka ku-1 kuya ku-10 - ihluzo). I-eksisi eqondile ikhombisa ukusheshisa okubangelwa ukwanda kwe-SSC ehlobene necala lapho kunenqolobane eyodwa kuphela.

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Ifundwa kanjani igrafu: Isikhathi sokwenza sezinkulungwane eziyi-100 sifundwa kumabhulokhi angama-64 KB anenqolobane eyodwa sidinga imizuzwana engama-78. Nakuba ngama-cache angu-5 kuthatha imizuzwana engu-16. Labo. kukhona ukusheshisa izikhathi ezingu-~5. Njengoba kungabonakala kugrafu, umphumela awubonakali kakhulu enanini elincane lokufundwa okuhambisanayo; iqala ukudlala indima ebonakalayo lapho kufundwe intambo engaphezu kuka-50. Kuyaphawuleka futhi ukuthi ukwandisa inani lama-SSC kusuka ku-6 futhi ngenhla kunikeza ukukhuphuka kokusebenza okuncane kakhulu.

Qaphela 1: njengoba imiphumela yokuhlolwa iguquguquka kakhulu (bheka ngezansi), kwenziwa imigijimo emi-3 futhi amanani atholakele alinganiswa.

Qaphela 2: Inzuzo yokusebenza ngokulungiselela ukufinyelela okungahleliwe iyafana, nakuba ukufinyelela ngokwako kuhamba kancane.

Kodwa-ke, kuyadingeka ukucacisa ukuthi, ngokungafani necala le-HBase, lokhu kusheshisa akuhlali kumahhala. Lapha "sivula" ikhono le-CPU lokwenza umsebenzi owengeziwe, esikhundleni sokulenga ezingidini.

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Lapha ungabona ukuthi, ngokuvamile, ukwanda kwenani lezinqolobane kunikeza cishe ukwanda okulinganayo kokusetshenziswa kwe-CPU. Nokho, kukhona izinhlanganisela eziwinayo ezithe xaxa.

Isibonelo, ake sibhekisise isilungiselelo SSC = 3. Ukwanda kokusebenza ebangeni cishe izikhathi ezi-3.3. Ngezansi kunemiphumela yawo wonke ama-run amathathu ahlukene.

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Ngenkathi ukusetshenziswa kwe-CPU kukhuphuka izikhathi ezingaba ngu-2.8. Umehluko awumkhulu kakhulu, kodwa uGreta omncane usevele ejabule futhi angase abe nesikhathi sokuya esikoleni futhi athathe izifundo.

Ngakho, lokhu kuzoba nomthelela omuhle kunoma yiliphi ithuluzi elisebenzisa ukufinyelela kwenqwaba ku-HDFS (ngokwesibonelo i-Spark, njll.), inqobo nje uma ikhodi yohlelo lokusebenza ingasindi (okungukuthi ipulaki lisohlangothini lweklayenti le-HDFS) futhi kukhona amandla e-CPU amahhala. . Ukuze sihlole, ake sihlole ukuthi ukusetshenziswa okuhlanganisiwe kokuthuthukisa i-BlockCache nokushuna kwe-SSC ukuze kufundwe ku-HBase kuzoba namuphi umphumela.

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Kungabonakala ukuthi ngaphansi kwezimo ezinjalo umphumela awukho mkhulu njengokuhlolwa okucolisisiwe (ukufunda ngaphandle kokucubungula), kodwa kungenzeka ukuthi ukhiphe enye i-80K lapha. Ndawonye, ​​kokubili ukulungiselelwa kunikeza isivinini esingafika ku-4x.

I-PR nayo yenzelwe lokhu kulungiselelwa [HDFS-15202], esihlanganisiwe futhi lokhu kusebenza kuzotholakala ekukhishweni okuzayo.

Futhi ekugcineni, bekujabulisa ukuqhathanisa ukusebenza kokufunda kwesizindalwazi sekholomu ebanzi efanayo, i-Cassandra ne-HBase.

Ukwenza lokhu, sethule izimo zensiza ejwayelekile yokuhlola umthwalo we-YCSB kusuka kubasingathi ababili (imicu engu-800 isiyonke). Ohlangothini lweseva - izimo ezi-4 ze-RegionServer ne-Cassandra kubasingathi abangu-4 (hhayi labo lapho amakhasimende esebenza khona, ukugwema ithonya lawo). Ukufundwa kuvele kumathebula osayizi:

I-HBase – 300 GB ku-HDFS (idatha emsulwa engu-100 GB)

I-Cassandra - 250 GB (isici sokuphindaphinda = 3)

Labo. ivolumu yayicishe ifane (ku-HBase kancane kancane).

Imingcele ye-HBase:

dfs.client.short.circuit.num = 5 (Ukuthuthukiswa kweklayenti le-HDFS)

hbase.lru.cache.heavy.eviction.count.limit = 30 - lokhu kusho ukuthi isichibi sizoqala ukusebenza ngemuva kokuxoshwa okungama-30 (~ imizuzu emi-5)

hbase.lru.cache.heavy.eviction.mb.size.limit = 300 - umthamo ohlosiwe wokugcinwa kwesikhashana kanye nokuxoshwa

Amalogi e-YCSB ahlukaniswa futhi ahlanganiswa amagrafu e-Excel:

Ungakhuphula kanjani isivinini sokufunda kusuka ku-HBase kuze kufike ezikhathini ezi-3 futhi ukusuka ku-HDFS kuye ezikhathini ezi-5

Njengoba ubona, lokhu kulungiselelwa kwenza kube lula ukuqhathanisa ukusebenza kwalezi zingosi zolwazi ngaphansi kwalezi zimo futhi kuzuzwe ukufundwa okuyizinkulungwane ezingama-450 ngomzuzwana.

Sithemba ukuthi lolu lwazi lungaba usizo kothile ngesikhathi somzabalazo othokozisayo wokukhiqiza.

Source: www.habr.com

Engeza amazwana