Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank

Wannan ba ko da wasa ba ne, da alama wannan hoton na musamman yana nuna ainihin ma'anar waɗannan bayanan, kuma a ƙarshe zai bayyana dalilin da ya sa:

Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank

A cewar DB-Engines Ranking, mashahuran bayanan NoSQL guda biyu sune Cassandra (CS) da HBase (HB).

Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank

Ta hanyar nufin kaddara, ƙungiyar sarrafa bayanan mu a Sberbank ta riga ta rigaya давно kuma yana aiki tare da HB. A wannan lokacin, mun yi nazari sosai game da ƙarfinsa da rauninsa kuma mun koyi yadda ake dafa shi. Duk da haka, kasancewar wani madadin a cikin nau'i na CS ko da yaushe ya tilasta mana mu azabtar da kanmu kadan tare da shakka: shin mun yi zabi mai kyau? Bugu da ƙari, sakamakon kwatancen, wanda DataStax ya yi, sun ce CS cikin sauƙin doke HB tare da kusan maki mai murkushewa. A gefe guda, DataStax ƙungiya ce mai sha'awar, kuma bai kamata ku ɗauki kalmarsu ba. Mun kuma ruɗe da ɗan ƙaramin bayanai game da yanayin gwaji, don haka muka yanke shawarar gano kanmu wanene sarkin BigData NoSql, kuma sakamakon da aka samu ya kasance mai ban sha'awa sosai.

Duk da haka, kafin a ci gaba zuwa sakamakon gwaje-gwajen da aka yi, ya zama dole a bayyana mahimman abubuwan da aka tsara na yanayi. Gaskiyar ita ce, ana iya amfani da CS a cikin yanayin da ke ba da damar asarar bayanai. Wadancan. wannan shine lokacin da uwar garken (node) ɗaya kaɗai ke da alhakin bayanan wani maɓalli, kuma idan saboda wasu dalilai ya gaza, to darajar wannan maɓalli za ta ɓace. Don ayyuka da yawa wannan ba mahimmanci ba ne, amma ga ɓangaren banki wannan keɓantawa maimakon ka'ida. A cikin yanayinmu, yana da mahimmanci a sami kwafin bayanai da yawa don ingantaccen ajiya.

Saboda haka, kawai yanayin aiki na CS a cikin yanayin kwafi sau uku an yi la'akari da shi, watau. An gudanar da ƙirƙirar filin shari'ar tare da sigogi masu zuwa:

CREATE KEYSPACE ks WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3};

Na gaba, akwai hanyoyi guda biyu don tabbatar da matakin da ake buƙata na daidaito. Gabaɗaya doka:
NW + NR> RF

Wanne yana nufin cewa adadin tabbatarwa daga nodes lokacin rubuta (NW) tare da adadin tabbatarwa daga nodes lokacin karantawa (NR) dole ne ya zama mafi girma fiye da abin da aka kwafi. A cikin yanayinmu, RF = 3, wanda ke nufin waɗannan zaɓuɓɓukan sun dace:
2 + 2 > 3
3 + 1 > 3

Tun da yake yana da mahimmanci a gare mu mu adana bayanan gwargwadon abin da zai yiwu, an zaɓi tsarin 3+1. Bugu da ƙari, HB yana aiki akan irin wannan ka'ida, watau. irin wannan kwatancen zai zama mafi adalci.

Ya kamata a lura cewa DataStax ya yi akasin haka a cikin binciken su, sun saita RF = 1 don duka CS da HB (na karshen ta canza saitunan HDFS). Wannan lamari ne mai mahimmanci saboda tasirin aikin CS a cikin wannan yanayin yana da girma. Misali, hoton da ke ƙasa yana nuna haɓakar lokacin da ake buƙata don loda bayanai cikin CS:

Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank

Anan muna ganin abubuwa masu zuwa: yawan zaren da ke fafatawa da juna suna rubuta bayanai, yana ɗaukar tsawon lokaci. Wannan dabi'a ce, amma yana da mahimmanci cewa lalacewar aiki don RF = 3 yana da girma sosai. A wasu kalmomi, idan muka rubuta zaren 4 a cikin tebur 5 kowanne (20 a duka), to RF = 3 ya yi hasara da kusan sau 2 (150 seconds na RF = 3 da 75 don RF = 1). Amma idan muka ƙara nauyi ta hanyar loda bayanai zuwa tebur 8 tare da zaren 5 kowanne (40 a duka), to, asarar RF = 3 ya riga ya zama sau 2,7 (375 seconds da 138).

Watakila wannan wani bangare ne na sirrin nasarar gwajin lodi da DataStax ya yi don CS, saboda HB a matsayinmu na canza yanayin maimaitawa daga 2 zuwa 3 bai yi wani tasiri ba. Wadancan. faifai ba su ne ƙulli na HB don daidaitawar mu ba. Duk da haka, akwai wasu matsaloli da yawa a nan, saboda ya kamata a lura cewa nau'in mu na HB ya dan kadan kuma an tweaked, yanayin ya bambanta, da dai sauransu. Har ila yau, ya kamata a lura cewa watakila ban san yadda za a shirya CS daidai ba kuma akwai wasu hanyoyi masu tasiri don yin aiki tare da shi, kuma ina fata za mu gano a cikin sharhi. Amma abubuwa na farko.

Dukkan gwaje-gwajen an yi su ne akan gungu na kayan masarufi wanda ya ƙunshi sabobin 4, kowanne yana da tsari mai zuwa:

CPU: Xeon E5-2680 v4 @ 2.40GHz 64 zaren.
Disk: guda 12 SATA HDD
sigar java: 1.8.0_111

Shafin: 3.11.5

cassandra.yml sigogilamba: 256
hinted_handoff_enabled: gaskiya
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /data10/cassandra/hints
alamu_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128
batchlog_replay_throttle_in_kb: 1024
ingantacce: AllowAllAuthenticator
mawallafi: AllowAllAuthorizer
role_manager: CassandraRoleManager
matsayin_tabbatacce_in_ms: 2000
izini_daidaitu_in_ms: 2000
takardun shaidarka_inci_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
bayanai_file_directories:
- /data1/cassandra/data # kowane directory dataN daban faifai ne
- /data2/cassandra/data
- /data3/cassandra/data
- /data4/cassandra/data
- /data5/cassandra/data
- /data6/cassandra/data
- /data7/cassandra/data
- /data8/cassandra/data
Committeelog_directory: /data9/cassandra/commitlog
cdc_enabled: ƙarya
disk_failure_policy: tsaya
aiwatar_manufa_failure: tsaya
shirye-shiryen_cache_size_mb:
thrift_prepared_bayanin_cache_size_mb:
key_cache_size_in_mb:
key_cache_save_period: 14400
jere_cache_size_in_mb: 0
jere_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
save_caches_directory: /data10/cassandra/saved_caches
Committeelog_sync: lokaci-lokaci
Committeelog_sync_period_in_ms: 10000
sadaukar_segment_size_in_mb: 32
mai bayarwa_ iri:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
sigogi:
- tsaba: "*,*"
concurrent_reads: 256 # gwada 64 - babu bambanci da aka lura
concurrent_writes: 256 # gwada 64 - babu bambanci da aka lura
concurrent_counter_writes: 256 # gwada 64 - babu bambanci da aka lura
lokaci guda_materialized_view_writes: 32
memtable_heap_space_in_mb: 2048 # gwada 16 GB - ya kasance a hankali
memtable_allocation_type: heap_buffers
index_summary_ikon_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: karya
trickle_fsync_interval_in_kb: 10240
ajiya_tashar jiragen ruwa: 7000
ssl_storage_port: 7001
saurare_adireshi:*
adireshin_watsawa:*
sauraron_adireshin_watsawa: gaskiya
internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
start_native_transport: gaskiya
na asali_transport_port: 9042
start_rpc: gaskiya
rpc_address:*
rpc_tashar ruwa: 9160
rpc_keepalive: gaskiya
rpc_server_type: daidaitawa
thrift_framed_transport_size_in_mb: 15
incremental_backups: ƙarya
hoton_kafin_compaction: karya
auto_snapshot: gaskiya
shafi_index_size_in_kb: 64
shafi_index_cache_size_in_kb: 2
masu haɗaka_lokaci: 4
compaction_throughput_mb_per_sec: 1600
sstable_preemptive_bude_interval_in_mb: 50
karanta_request_timeout_in_ms: 100000
zangon_request_lokaci_in_ms: 200000
rubuta_request_timeout_in_ms: 40000
counter_write_request_timeout_in_ms: 100000
cas_contention_timeout_in_ms: 20000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 200000
slow_query_log_timeout_in_ms: 500
cross_node_timeout: karya
endpoint_snitch: GossipingPropertyFileSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_kofa: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
uwar garken_encryption_options:
internode_encryption: babu
abokin ciniki_encryption_options:
kunna: karya
internode_compression: dc
inter_dc_tcp_nodelay: ƙarya
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
ikon_user_defined_functions: ƙarya
ikon_scripted_user_defined_functions: ƙarya
windows_timer_interval: 1
zažužžukan_encryption_bayanai:
kunna: karya
kabari_gargadin_kofar: 1000
bakin kabari_failure_kofa: 100000
batch_size_warn_threshold_in_kb: 200
batch_size_fail_threshold_in_kb: 250
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_gargadi_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
back_pressure_enabled: ƙarya
ba da damar_materialized_views: gaskiya
ikon_sasi_indexes: gaskiya

Saitunan GC:

### Saitunan CMS-XX:+Yi amfani daParNewGC
-XX:+Yi amfani daConcMarkSweepGC
-XX:+CMSParallelRemarkAn kunna
-XX:SurvivorRatio=8
-XX: MaxTenuringThreshold=1
-XX:CMSinitiatingOccupancyFraction=75
-XX:+AmfaniCMSInitiating Occupancy Only
-XX:CMSWaitDuration=10000
-XX:+CMSParallel Alamar Farko An kunna
-XX:+CMSEdenChunksRecordKoyaushe
-XX:+CMSClass An kunna saukewa

An ware ƙwaƙwalwar jvm.options 16Gb (mun kuma gwada 32 Gb, ba a lura da bambanci ba).

An ƙirƙiri allunan tare da umarni:

CREATE TABLE ks.t1 (id bigint PRIMARY KEY, title text) WITH compression = {'sstable_compression': 'LZ4Compressor', 'chunk_length_kb': 64};

Sigar HB: 1.2.0-cdh5.14.2 (a cikin aji org.apache.hadoop.hbase.regionserver.HRegion mun cire MetricsRegion wanda ya haifar da GC lokacin da adadin yankuna ya fi 1000 akan RegionServer)

Simitocin HBase marasa asalizookeeper.zama.Lokaci ya ƙare: 120000
hbase.rpc.lokacin ƙarewa: minti 2 (s)
hbase.client.scanner.timeout.period: 2 minutes(s)
hbase.master.handler.count: 10
hbase.regionserver.lease.period, hbase.client.scanner.timeout.period: 2 minutes(s)
hbase.regionserver.handler.count: 160
hbase.regionserver.metahandler.count: 30
hbase.regionserver.logroll.period: 4 hours (s)
hbase.regionserver.maxlogs: 200
hbase.hregion.memstore.flush.size: 1 GiB
hbase.hregion.memstore.block.multiplier: 6
hbase.hstore.compaction Matsakaicin: 5
hbase.hstore.blockingStoreFiles: 200
hbase.hregion.majorcompaction: kwana 1
HBase Babban Tsarin Kanfigareshan Snippet (Bawul ɗin Tsaro) don hbase-site.xml:
hbase.regionserver.wal.codecorg.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
hbase.master.namespace.init.timeout3600000
hbase.regionserver.optionalcacheflushinterval18000000
hbase.regionserver.thread.compaction.large12
hbase.regionserver.wal.enablecompressiontrue
hbase.hstore.compaction.max.size1073741824
hbase.server.compactchecker.interval.multiplier200
Zaɓuɓɓukan Kanfigareshan Java don HBase RegionServer:
-XX:+AmfaniParNewGC -XX:+AmfaniConcMarkSweepGC -XX:CMSitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:ReservedCodeCacheSize=256m
hbase.snapshot.master.timeoutMillis: minti 2
hbase.snapshot.region.lokacin ƙarewa: minti 2
hbase.snapshot.master.timeout.millis: minti 2
HBase REST Server Max Girman Log: 100 MiB
Matsakaicin Ajiyayyen Fayil ɗin Log na HBase REST: 5
HBase Thrift Server Max Girman Log: 100 MiB
HBase Thrift Server Mafi girman Ajiyayyen Fayil ɗin Log: 5
Girman Jagora Max Girman: 100 MiB
Babban Ajiyayyen Fayil ɗin Shiga: 5
Matsakaicin Girman Log Server Server: 100 MiB
Matsakaicin Ajiyayyen Fayil na Shiga Server Server: 5
HBase Active Master Gane Tagar: Minti 4 (s)
dfs.client.hedged.read.threadpool. girman: 40
dfs.client.hedged.read.threshold.milis: 10 millisecond(s)
hbase.rest. zaren.min: 8
hbase.rest.threads.max: 150
Matsakaicin Bayanin Fayil na Tsari: 180000
hbase.thrift.minMai aiki Labarai: 200
hbase.master.executor.openregion.threads: 30
hbase.master.executor.closeregion.threads: 30
hbase.master.executor.serverops.threads: 60
hbase.regionalserver.thread.compaction.mall: 6
hbase.ipc.server.read.threadpool. girman: 20
Matsalolin Juyawa: 6
Girman Heap na abokin ciniki a cikin Bytes: 1 GiB
Ƙungiyar Tsohuwar Sabar HBase REST: 3 GiB
Ƙungiya ta Tsohuwar Sabar HBase Thrift: 3 GiB
Girman Heap na Java na HBase Master a cikin Bytes: 16 GiB
Girman Heap Java na HBase RegionServer a cikin Bytes: 32 GiB

+ZooKeeper
maxClientCnxns: 601
maxSessionTimeout: 120000
Ƙirƙirar teburi:
hbase org.apache.hadoop.hbase.util.RegionSplitter ns:t1 UniformSplit -c 64 -f cf
canza 'ns:t1', {NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'GZ'}

Akwai muhimmiyar mahimmanci a nan - bayanin DataStax bai faɗi yawan yankuna da aka yi amfani da su don ƙirƙirar tebur na HB ba, kodayake wannan yana da mahimmanci ga manyan kundin. Don haka, don gwaje-gwajen, an zaɓi adadin = 64, wanda ke ba da damar adana har zuwa 640 GB, watau. matsakaicin girman tebur.

A lokacin gwajin, HBase yana da tebur 22 da yankuna 67 (wannan zai zama mai mutuwa ga sigar 1.2.0 idan ba facin da aka ambata a sama ba).

Yanzu ga code. Tun da yake ba a fayyace waɗanne gyare-gyare ne suka fi fa'ida ga takamaiman bayanai ba, an gudanar da gwaje-gwaje a haɗe-haɗe daban-daban. Wadancan. a wasu gwaje-gwaje, an ɗora tebur 4 a lokaci guda (dukkan nodes 4 an yi amfani da su don haɗi). A wasu gwaje-gwajen mun yi aiki tare da teburi 8 daban-daban. A wasu lokuta, girman batch ya kasance 100, a wasu kuma 200 (madaidaicin tsari - duba lambar da ke ƙasa). Girman bayanai don ƙimar shine 10 bytes ko 100 bytes (DataSize). Gabaɗaya, an rubuta rikodin miliyan 5 kuma an karanta su cikin kowane tebur kowane lokaci. A lokaci guda, an rubuta zaren guda 5 / karanta zuwa kowane tebur (lambar zaren - thNum), kowannensu yana amfani da kewayon maɓallansa (ƙidaya = 1 miliyan):

if (opType.equals("insert")) {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        StringBuilder sb = new StringBuilder("BEGIN BATCH ");
        for (int i = 0; i < batch; i++) {
            String value = RandomStringUtils.random(dataSize, true, true);
            sb.append("INSERT INTO ")
                    .append(tableName)
                    .append("(id, title) ")
                    .append("VALUES (")
                    .append(key)
                    .append(", '")
                    .append(value)
                    .append("');");
            key++;
        }
        sb.append("APPLY BATCH;");
        final String query = sb.toString();
        session.execute(query);
    }
} else {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        StringBuilder sb = new StringBuilder("SELECT * FROM ").append(tableName).append(" WHERE id IN (");
        for (int i = 0; i < batch; i++) {
            sb = sb.append(key);
            if (i+1 < batch)
                sb.append(",");
            key++;
        }
        sb = sb.append(");");
        final String query = sb.toString();
        ResultSet rs = session.execute(query);
    }
}

Saboda haka, an samar da irin wannan ayyuka don HB:

Configuration conf = getConf();
HTable table = new HTable(conf, keyspace + ":" + tableName);
table.setAutoFlush(false, false);
List<Get> lGet = new ArrayList<>();
List<Put> lPut = new ArrayList<>();
byte[] cf = Bytes.toBytes("cf");
byte[] qf = Bytes.toBytes("value");
if (opType.equals("insert")) {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        lPut.clear();
        for (int i = 0; i < batch; i++) {
            Put p = new Put(makeHbaseRowKey(key));
            String value = RandomStringUtils.random(dataSize, true, true);
            p.addColumn(cf, qf, value.getBytes());
            lPut.add(p);
            key++;
        }
        table.put(lPut);
        table.flushCommits();
    }
} else {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        lGet.clear();
        for (int i = 0; i < batch; i++) {
            Get g = new Get(makeHbaseRowKey(key));
            lGet.add(g);
            key++;
        }
        Result[] rs = table.get(lGet);
    }
}

Tunda a cikin HB dole ne abokin ciniki ya kula da rarraba bayanai iri ɗaya, aikin gishiri mai mahimmanci yayi kama da haka:

public static byte[] makeHbaseRowKey(long key) {
    byte[] nonSaltedRowKey = Bytes.toBytes(key);
    CRC32 crc32 = new CRC32();
    crc32.update(nonSaltedRowKey);
    long crc32Value = crc32.getValue();
    byte[] salt = Arrays.copyOfRange(Bytes.toBytes(crc32Value), 5, 7);
    return ArrayUtils.addAll(salt, nonSaltedRowKey);
}

Yanzu mafi ban sha'awa sashi - sakamakon:

Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank

Abu ɗaya a cikin sigar jadawali:

Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank

Amfanin HB yana da ban mamaki sosai cewa akwai tuhuma cewa akwai wani nau'i na kwalban a cikin saitin CS. Koyaya, Googling da neman fitattun sigogi (kamar concurrent_writes ko memtable_heap_space_in_mb) ba su hanzarta abubuwa ba. A lokaci guda, rajistan ayyukan suna da tsabta kuma kada ku yi rantsuwa a wani abu.

An rarraba bayanan a ko'ina cikin nodes, ƙididdiga daga duk nodes sun kasance kusan iri ɗaya.

Wannan shine yadda kididdigar tebur ɗin tayi kama daga ɗayan nodesMaɓalli: ks
Saukewa: 9383707
Karanta Latency: 0.04287025042448576 ms
Rubuta: 15462012
Rubuta Latency: 0.1350068438699957 ms
Fuskokin da ake jira: 0
Table: t1
Ƙididdigar SSTable: 16
Sararin da aka yi amfani da shi (rayuwa): 148.59 MiB
Sararin da aka yi amfani da shi (jimla): 148.59 MiB
Sararin da aka yi amfani da shi ta hotuna (jimla): 0 bytes
Kashe tarin ƙwaƙwalwar ajiya da aka yi amfani da shi (jimla): 5.17 MiB
Matsakaicin Matsi na SSTable: 0.5720989576459437
Adadin bangare (kimanta): 3970323
Ƙididdigar tantanin halitta: 0
Girman bayanan da ba a iya jurewa: 0 bytes
Ƙwaƙwalwar ƙwaƙwalwar ajiya da aka yi amfani da ita: 0 bytes
Ƙididdiga masu iya canzawa: 5
Ƙididdiga na gida: 2346045
Latjin karatun gida: NaN ms
Ƙididdiga na gida: 3865503
Latency rubutu na gida: NaN ms
Matsalolin da ake jira: 0
Kashi na gyarawa: 0.0
Bloom tace abubuwan karya: 25
Bloom tace karya rabo: 0.00000
Wurin tace Bloom mai amfani: 4.57 MiB
Bloom tace kashe tarin ƙwaƙwalwar ajiya da aka yi amfani da shi: 4.57 MiB
Ƙididdigar ƙididdiga ta kashe ƙwaƙwalwar ajiyar da aka yi amfani da ita: 590.02 KiB
Ƙwaƙwalwar metadata kashe ƙwaƙwalwar ajiya da aka yi amfani da ita: 19.45 KiB
Ƙaddamarwa mafi ƙarancin bytes: 36
Matsakaicin ƙaƙƙarfan juzu'i: 42
Ƙarƙashin ɓangaren ma'anar bytes: 42
Matsakaicin sel masu rai a kowane yanki (minti biyar na ƙarshe): NaN
Matsakaicin sel masu rai a kowane yanki (minti biyar na ƙarshe): 0
Matsakaicin duwatsun kabari kowane yanki (minti biyar na ƙarshe): NaN
Matsakaicin duwatsun kabari kowane yanki (minti biyar na ƙarshe): 0
Canje-canjen da aka Sauke: 0 bytes

Yunkurin rage girman batch (ko da aika shi daidai-da-wane) bai yi tasiri ba, sai dai ya kara muni. Yana yiwuwa cewa a gaskiya wannan shine ainihin matsakaicin aikin CS, tun da sakamakon da aka samu don CS yayi kama da waɗanda aka samu don DataStax - game da daruruwan dubban ayyuka a kowace dakika. Bugu da ƙari, idan muka kalli amfani da albarkatu, za mu ga cewa CS yana amfani da CPU da faifai masu yawa:

Yakin yakozuna biyu, ko Cassandra vs HBase. Kwarewar ƙungiyar Sberbank
Hoton yana nuna amfani yayin gudanar da duk gwaje-gwaje a jere don duka bayanan bayanai.

Game da fa'idar karatun HB mai ƙarfi. Anan zaka iya ganin cewa ga duka bayanan bayanai, amfani da faifai yayin karatun yana da ƙasa sosai (karanta gwaje-gwajen shine ɓangaren ƙarshe na sake zagayowar gwaji ga kowane ma'ajin bayanai, misali ga CS wannan daga 15:20 zuwa 15:40). A cikin yanayin HB, dalili a bayyane yake - yawancin bayanan suna rataye a ƙwaƙwalwar ajiya, a cikin memstore, wasu kuma an adana su a cikin blockcache. Dangane da CS, ba a bayyana yadda yake aiki ba, amma sake yin amfani da faifai shima ba a iya gani, amma kawai idan an yi ƙoƙari don kunna cache row_cache_size_in_mb = 2048 da saita caching = {'keys': 'ALL', 'rows_per_partition': '2000000'}, amma hakan ya sa ya ɗan ƙara muni.

Hakanan yana da kyau a sake ambaton wani muhimmin batu game da adadin yankuna a cikin HB. A cikin yanayinmu, an ƙayyade ƙimar a matsayin 64. Idan kun rage shi kuma ku sanya shi daidai, misali, 4, to, lokacin karantawa, saurin yana raguwa sau 2. Dalili kuwa shi ne memstore zai cika da sauri kuma fayiloli za su kasance suna gogewa akai-akai kuma lokacin karantawa, za a buƙaci ƙarin sarrafa fayiloli, wanda shine aiki mai rikitarwa ga HB. A cikin yanayi na ainihi, ana iya magance wannan ta hanyar tunani ta hanyar tsarawa da dabarun haɓakawa; musamman, muna amfani da kayan aikin da aka rubuta da kansa wanda ke tattara datti kuma yana matsawa HFiles koyaushe a bango. Yana yiwuwa don gwaje-gwajen DataStax sun ware yanki 1 kawai a kowane tebur (wanda ba daidai ba) kuma wannan zai ɗan fayyace dalilin da yasa HB ya kasance ƙasa da gwajin karatun su.

An fitar da waɗannan abubuwan farko na ƙarshe daga wannan. Tsammanin cewa ba a yi manyan kurakurai a lokacin gwaji ba, to Cassandra ya yi kama da colossus mai ƙafar yumbu. Mafi daidai, yayin da ta daidaita a kan ƙafa ɗaya, kamar yadda yake a cikin hoton a farkon labarin, ta nuna sakamako mai kyau, amma a cikin fada a karkashin yanayi guda ta yi hasara. A lokaci guda, la'akari da ƙarancin amfani da CPU akan kayan aikinmu, mun koyi shuka RegionServer HBs guda biyu a kowane mai masaukin baki kuma ta haka ne muka ninka aikin. Wadancan. Yin la'akari da amfani da albarkatun, halin da ake ciki na CS ya fi damuwa.

Tabbas, waɗannan gwaje-gwajen na roba ne kuma adadin bayanan da aka yi amfani da su anan yana da ɗan ƙanƙanta. Yana yiwuwa idan muka koma terabytes, yanayin zai bambanta, amma yayin da HB za mu iya loda terabytes, ga CS wannan ya zama matsala. Sau da yawa yana jefa OperationTimedOutException har ma da waɗannan kundin, kodayake sigogin jiran amsa sun riga sun ƙaru sau da yawa idan aka kwatanta da waɗanda aka saba.

Ina fatan cewa ta hanyar haɗin gwiwar haɗin gwiwa za mu sami ƙullun CS kuma idan za mu iya hanzarta shi, to a karshen sakon zan ƙara bayani game da sakamakon ƙarshe.

UPD: Godiya ga shawarar abokan aiki, na sami damar hanzarta karatun. Ya kasance:
159 ops (tebura 644, rafuka 4, tsari 5).
An kara:
.tare da LoadBalancingPolicy(sabon TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().build()))
Kuma na yi wasa tare da adadin zaren. Sakamakon shine kamar haka:
Tebura 4, zaren 100, tsari = 1 (yanki da yanki): 301 ops
4 tebur, zaren 100, tsari = 10: 447 ops
4 tebur, zaren 100, tsari = 100: 625 ops

Daga baya zan yi amfani da wasu nasihu na tuning, gudanar da cikakken gwajin sake zagayowar kuma in ƙara sakamako a ƙarshen post ɗin.

source: www.habr.com

Add a comment