Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub

Qhov no tsis yog ib qho kev tso dag, nws zoo nkaus li tias daim duab tshwj xeeb no qhia meej txog cov ntsiab lus ntawm cov ntaub ntawv no, thiab thaum kawg nws yuav paub meej tias yog vim li cas:

Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub

Raws li DB-Engines Ranking, ob qhov nrov tshaj plaws NoSQL columnar databases yog Cassandra (tom qab no CS) thiab HBase (HB).

Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub

Los ntawm txoj hmoo ntawm lub siab nyiam, peb pab pawg tswj xyuas cov ntaub ntawv ntawm Sberbank muaj lawm ntev dhau los thiab ua haujlwm zoo nrog HB. Lub sijhawm no, peb tau kawm nws qhov kev ua tau zoo thiab qhov tsis muaj zog zoo heev thiab kawm paub yuav ua li cas ua noj. Txawm li cas los xij, qhov muaj lwm txoj hauv kev ntawm CS ib txwm yuam kom peb tsim txom peb tus kheej me ntsis nrog kev tsis ntseeg: peb puas tau xaiv txoj cai? Ntxiv mus, cov txiaj ntsig sib piv, ua los ntawm DataStax, lawv hais tias CS yooj yim ntaus HB nrog yuav luag cov qhab nia crushing. Ntawm qhov tod tes, DataStax yog ib tus neeg txaus siab, thiab koj yuav tsum tsis txhob coj lawv cov lus rau nws. Peb kuj tau tsis meej pem los ntawm cov ntaub ntawv me me ntawm cov xwm txheej sim, yog li peb tau txiav txim siab los tshawb xyuas peb tus kheej uas yog tus huab tais ntawm BigData NoSql, thiab cov txiaj ntsig tau los ua qhov nthuav dav heev.

Txawm li cas los xij, ua ntej hloov mus rau qhov kev ntsuam xyuas tau ua, nws yuav tsum tau piav qhia txog qhov tseem ceeb ntawm kev teeb tsa ib puag ncig. Qhov tseeb yog tias CS tuaj yeem siv rau hauv hom uas tso cai rau cov ntaub ntawv poob. Cov. qhov no yog thaum tsuas yog ib tus neeg rau zaub mov (node) yog lub luag haujlwm rau cov ntaub ntawv ntawm qee qhov tseem ceeb, thiab yog vim li cas nws tsis ua haujlwm, ces tus nqi ntawm tus yuam sij no yuav ploj mus. Rau ntau txoj haujlwm no tsis yog qhov tseem ceeb, tab sis rau kev lag luam nyiaj txiag qhov no yog qhov tshwj tsis yog txoj cai. Nyob rau hauv peb cov ntaub ntawv, nws yog ib qho tseem ceeb kom muaj ob peb daim ntawv luam ntawm cov ntaub ntawv rau kev cia siab.

Yog li ntawd, tsuas yog CS kev khiav hauj lwm hom nyob rau hauv triple replication hom tau txiav txim siab, i.e. Kev tsim ntawm casespace tau ua nrog cov hauv qab no tsis:

CREATE KEYSPACE ks WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3};

Tom ntej no, muaj ob txoj hauv kev los xyuas kom meej qhov yuav tsum tau muaj qib sib xws. Txoj cai dav dav:
NW + NR > RF

Qhov no txhais tau hais tias tus naj npawb ntawm cov ntawv lees paub los ntawm cov nodes thaum sau ntawv (NW) ntxiv rau cov lej ntawm cov ntawv lees paub los ntawm cov nodes thaum nyeem ntawv (NR) yuav tsum ntau dua li qhov rov ua dua. Hauv peb cov ntaub ntawv, RF = 3, uas txhais tau hais tias cov kev xaiv hauv qab no tsim nyog:
2 + 2 > 3
3 + 1 > 3

Txij li nws yog ib qho tseem ceeb rau peb khaws cov ntaub ntawv kom ntseeg tau raws li qhov ua tau, 3 + 1 lub tswv yim tau xaiv. Tsis tas li ntawd, HB ua haujlwm ntawm cov ntsiab lus zoo sib xws, i.e. xws li kev sib piv yuav ncaj ncees dua.

Nws yuav tsum raug sau tseg tias DataStax tau ua qhov sib txawv hauv lawv txoj kev kawm, lawv teeb tsa RF = 1 rau CS thiab HB (rau tom kawg los ntawm kev hloov pauv HDFS nqis). Qhov no yog qhov tseem ceeb heev vim tias qhov cuam tshuam ntawm CS kev ua tau zoo hauv qhov no yog qhov loj heev. Piv txwv li, daim duab hauv qab no qhia txog qhov nce ntawm lub sijhawm xav tau los thauj cov ntaub ntawv rau hauv CS:

Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub

Ntawm no peb pom cov hauv qab no: qhov sib tw threads ntau sau cov ntaub ntawv, nws yuav siv sij hawm ntev. Qhov no yog ntuj, tab sis nws yog ib qho tseem ceeb uas qhov kev ua haujlwm degradation rau RF = 3 yog qhov siab dua. Hauv lwm lo lus, yog tias peb sau 4 xov rau hauv 5 lub rooj (20 tag nrho), ces RF = 3 poob los ntawm 2 zaug (150 vib nas this rau RF = 3 piv rau 75 rau RF = 1). Tab sis yog hais tias peb nce lub load los ntawm loading cov ntaub ntawv mus rau hauv 8 lub rooj nrog 5 threads txhua (40 nyob rau hauv tag nrho), ces qhov poob ntawm RF = 3 yog twb 2,7 zaug (375 vib nas this piv rau 138).

Tej zaum qhov no yog ib feem ntawm qhov tsis pub lwm tus paub ntawm kev ua tiav kev kuaj ua tiav los ntawm DataStax rau CS, vim hais tias rau HB ntawm peb qhov chaw hloov pauv qhov hloov pauv ntawm 2 mus rau 3 tsis muaj txiaj ntsig. Cov. disks tsis yog HB bottleneck rau peb configuration. Txawm li cas los xij, muaj ntau lwm qhov teeb meem ntawm no, vim nws yuav tsum tau muab sau tseg tias peb cov version ntawm HB tau me ntsis patched thiab tweaked, ib puag ncig txawv kiag li, thiab lwm yam. Nws tseem tsim nyog sau cia tias tej zaum kuv tsuas yog tsis paub yuav ua li cas npaj CS kom raug thiab muaj qee txoj hauv kev ua haujlwm nrog nws, thiab kuv vam tias peb yuav pom hauv cov lus. Tab sis thawj yam ua ntej.

Txhua qhov kev ntsuam xyuas tau ua tiav ntawm cov khoom siv kho vajtse uas muaj 4 servers, txhua tus nrog cov teeb tsa hauv qab no:

CPU: Xeon E5-2680 v4 @ 2.40GHz 64 xov.
Disks: 12 daim SATA HDD
java version: 1.8.0_111

CS Version: 3.11.5

cassandra.yml parametersnpe: 256
hinted_handoff_enabled: tseeb
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /data10/cassandra/hints
hint_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
Role_manager: CassandraRoleManager
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000
credentials_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /data1/cassandra/data # txhua dataN directory yog cais disk
- /data2/cassandra/data
- /data3/cassandra/data
- /data4/cassandra/data
- /data5/cassandra/data
- /data6/cassandra/data
- /data7/cassandra/data
- /data8/cassandra/data
commitlog_directory: /data9/cassandra/commitlog
cdc_enabled: cuav
disk_failure_policy: nres
commit_failure_policy: nres
npaj_statements_cache_size_mb:
thrift_prepared_statements_cache_size_mb:
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /data10/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
noob_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
tsis:
- noob: "*,*"
concurrent_reads: 256 # sim 64 - tsis muaj qhov sib txawv pom
concurrent_writes: 256 # sim 64 - tsis muaj qhov sib txawv pom
concurrent_counter_writes: 256 # sim 64 - tsis muaj qhov sib txawv pom
concurrent_materialized_view_writes: 32
memtable_heap_space_in_mb: 2048 # sim 16 GB - nws qeeb dua
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: cuav
trickle_fsync_interval_in_kb: 10240
Storage_port: 7000
ssl_storage_port: 7001
listen_address: *
broadcast_address: *
listen_on_broadcast_address: tseeb
internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
start_native_transport: tseeb
ib: 9042
start_rpc: tseeb
rpc_ chaw nyob: *
ib: 9160
rpc_keepalive: tseeb
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: cuav
snapshot_before_compaction: cuav
auto_snapshot: tseeb
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
concurrent_compactors: 4
compaction_throughput_mb_per_sec: 1600
sstable_preemptive_open_interval_in_mb: 50
nyeem_request_timeout_in_ms: 100000
range_request_timeout_in_ms: 200000
sau_request_timeout_in_ms: 40000
counter_write_request_timeout_in_ms: 100000
cas_contention_timeout_in_ms: 20000
truncate_request_timeout_in_ms: 60000
thov_timeout_in_ms: 200000
slow_query_log_timeout_in_ms: 500
cross_node_timeout: cuav
endpoint_snitch: GossipingPropertyFileSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: tsis muaj
client_encryption_options:
enabled: cuav
internode_compression: dc
inter_dc_tcp_nodelay: cuav
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
enable_user_defined_functions: cuav
enable_scripted_user_defined_functions: cuav
windows_timer_interval: 1
pob tshab_data_encryption_options:
enabled: cuav
Tombstone_warn_threshold: 1000
Tombstone_failure_threshold: 100000
batch_size_warn_threshold_in_kb: 200
batch_size_fail_threshold_in_kb: 250
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
back_pressure_enabled: cuav
enable_materialized_views: tseeb
enable_sasi_indexes: tseeb

GC Settings:

### CMS Chaw-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSIinitiatingOccupancyFraction=75
-XX: +SivCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
-XX: +CMSClassUnloadingEnabled

Lub jvm.options nco tau faib 16Gb (peb kuj sim 32 Gb, tsis pom qhov txawv).

Cov ntxhuav tau tsim nrog cov lus txib:

CREATE TABLE ks.t1 (id bigint PRIMARY KEY, title text) WITH compression = {'sstable_compression': 'LZ4Compressor', 'chunk_length_kb': 64};

HB version: 1.2.0-cdh5.14.2 (hauv chav kawm org.apache.hadoop.hbase.regionserver.HRegion peb tsis suav MetricsRegion uas coj mus rau GC thaum cov cheeb tsam ntau dua 1000 ntawm RegionServer)

Non-default HBase parameterszookeeper.session.timeout: 120000
hbase.rpc.timeout: 2 feeb
hbase.client.scanner.timeout.period: 2 feeb
hbase.master.handler.count: 10
hbase.regionserver.lease.period, hbase.client.scanner.timeout.period: 2 feeb
hbase.regionserver.handler.count: 160
hbase.regionserver.metahandler.count: 30
hbase.regionserver.logroll.period: 4 teev (s)
hbase.regionserver.maxlogs: 200
hbase.hregion.memstore.flush.size: 1 GiB
hbase.hregion.memstore.block.multiplier: 6
hbase.hstore.compactionThreshold: 5
hbase.hstore.blockingStoreFiles: 200
hbase.hregion.majorcompaction: 1 hnub
HBase Service Advanced Configuration Snippet (Safety Valve) rau hbase-site.xml:
hbase.regionserver.wal.codecorg.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
hbase.master.namespace.init.timeout3600000
hbase.regionserver.optionalcacheflushinterval18000000
hbase.regionserver.thread.compaction.large12
hbase.regionserver.wal.enablecompressiontrue
hbase.hstore.compaction.max.size1073741824
hbase.server.compactchecker.interval.multiplier200
Java Configuration Options rau HBase RegionServer:
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSIinitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:ReservedCodeCacheSize=256m
hbase.snapshot.master.timeoutMillis: 2 feeb
hbase.snapshot.region.timeout: 2 feeb
hbase.snapshot.master.timeout.millis: 2 feeb
HBase REST Server Max Log Loj: 100 MiB
HBase REST neeg rau zaub mov Cov ntaub ntawv teev cia siab tshaj plaws: 5
HBase Thrift Server Max Log Loj: 100 MiB
HBase Thrift neeg rau zaub mov Cov ntaub ntawv teev cia siab tshaj plaws: 5
Master Max Log Loj: 100 MiB
Master Maximum Log File Backups: 5
RegionServer Max Log Loj: 100 MiB
RegionServer Cov ntaub ntawv teev cia siab tshaj plaws: 5
HBase Active Master Detection Qhov rai: 4 feeb
dfs.client.hedged.read.threadpool.size: 40
dfs.client.hedged.read.threshold.millis: 10 millisecond(s)
hbase.rest.threads.min: 8
hbase.rest.threads.max: 150
Cov txheej txheem ntau tshaj cov ntaub ntawv piav qhia: 180000
hbase.thrift.minWorkerThreads: 200
hbase.master.executor.openregion.threads: 30
hbase.master.executor.closeregion.threads: 30
hbase.master.executor.serverops.threads: 60
hbase.regionserver.thread.compaction.small: 6
hbase.ipc.server.read.threadpool.size: 20
Region Mover Xov: 6
Client Java Heap Loj hauv Bytes: 1 GiB
HBase REST Server Default Group: 3 GiB
HBase Thrift Server Default Group: 3 GiB
Java Heap Loj ntawm HBase Master hauv Bytes: 16 GiB
Java Heap Loj ntawm HBase RegionServer hauv Bytes: 32 GiB

+ ZooKeeper
maxClientCnxns: 601
maxSessionTimeout: 120000
Tsim cov rooj:
hbase org.apache.hadoop.hbase.util.RegionSplitter ns:t1 UniformSplit -c 64 -f cf
alter 'ns:t1', {NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'GZ'}

Muaj ib qho tseem ceeb ntawm no - Cov lus piav qhia DataStax tsis tau hais tias pes tsawg thaj chaw tau siv los tsim HB cov lus, txawm hais tias qhov no yog qhov tseem ceeb rau cov ntim loj. Yog li ntawd, rau qhov kev ntsuam xyuas, qhov ntau = 64 tau xaiv, uas tso cai rau khaws cia txog 640 GB, i.e. nruab nrab lub rooj.

Thaum lub sijhawm xeem, HBase muaj 22 txhiab lub rooj thiab 67 txhiab cheeb tsam (qhov no yuav ua rau tuag taus rau version 1.2.0 yog tias tsis yog rau thaj chaw hais saum toj no).

Tam sim no rau code. Txij li nws tsis paub meej tias qhov kev teeb tsa twg muaj txiaj ntsig zoo dua rau cov ntaub ntawv tshwj xeeb, kev sim tau ua nyob rau hauv ntau qhov sib xyaw ua ke. Cov. Hauv qee qhov kev sim, 4 lub rooj tau thauj khoom ib txhij (tag nrho 4 lub pob tau siv rau kev sib txuas). Hauv lwm qhov kev sim peb tau ua haujlwm nrog 8 lub rooj sib txawv. Qee zaum, batch loj yog 100, hauv lwm tus 200 (batch parameter - saib cov cai hauv qab). Cov ntaub ntawv loj rau tus nqi yog 10 bytes lossis 100 bytes (dataSize). Nyob rau hauv tag nrho, 5 lab cov ntaub ntawv tau sau thiab nyeem rau hauv txhua lub rooj txhua zaus. Nyob rau tib lub sijhawm, 5 xov tau sau / nyeem rau txhua lub rooj (xov tooj - thNum), txhua tus siv nws tus kheej ntau yam ntawm cov yuam sij ( suav = 1 lab):

if (opType.equals("insert")) {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        StringBuilder sb = new StringBuilder("BEGIN BATCH ");
        for (int i = 0; i < batch; i++) {
            String value = RandomStringUtils.random(dataSize, true, true);
            sb.append("INSERT INTO ")
                    .append(tableName)
                    .append("(id, title) ")
                    .append("VALUES (")
                    .append(key)
                    .append(", '")
                    .append(value)
                    .append("');");
            key++;
        }
        sb.append("APPLY BATCH;");
        final String query = sb.toString();
        session.execute(query);
    }
} else {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        StringBuilder sb = new StringBuilder("SELECT * FROM ").append(tableName).append(" WHERE id IN (");
        for (int i = 0; i < batch; i++) {
            sb = sb.append(key);
            if (i+1 < batch)
                sb.append(",");
            key++;
        }
        sb = sb.append(");");
        final String query = sb.toString();
        ResultSet rs = session.execute(query);
    }
}

Raws li, kev ua haujlwm zoo sib xws tau muab rau HB:

Configuration conf = getConf();
HTable table = new HTable(conf, keyspace + ":" + tableName);
table.setAutoFlush(false, false);
List<Get> lGet = new ArrayList<>();
List<Put> lPut = new ArrayList<>();
byte[] cf = Bytes.toBytes("cf");
byte[] qf = Bytes.toBytes("value");
if (opType.equals("insert")) {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        lPut.clear();
        for (int i = 0; i < batch; i++) {
            Put p = new Put(makeHbaseRowKey(key));
            String value = RandomStringUtils.random(dataSize, true, true);
            p.addColumn(cf, qf, value.getBytes());
            lPut.add(p);
            key++;
        }
        table.put(lPut);
        table.flushCommits();
    }
} else {
    for (Long key = count * thNum; key < count * (thNum + 1); key += 0) {
        lGet.clear();
        for (int i = 0; i < batch; i++) {
            Get g = new Get(makeHbaseRowKey(key));
            lGet.add(g);
            key++;
        }
        Result[] rs = table.get(lGet);
    }
}

Txij li thaum nyob rau hauv HB tus neeg siv yuav tsum tau saib xyuas cov kev faib tawm ntawm cov ntaub ntawv, qhov tseem ceeb salting muaj nuj nqi zoo li no:

public static byte[] makeHbaseRowKey(long key) {
    byte[] nonSaltedRowKey = Bytes.toBytes(key);
    CRC32 crc32 = new CRC32();
    crc32.update(nonSaltedRowKey);
    long crc32Value = crc32.getValue();
    byte[] salt = Arrays.copyOfRange(Bytes.toBytes(crc32Value), 5, 7);
    return ArrayUtils.addAll(salt, nonSaltedRowKey);
}

Tam sim no qhov nthuav tshaj plaws - cov txiaj ntsig:

Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub

Tib yam hauv daim duab:

Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub

Qhov kom zoo dua ntawm HB yog qhov xav tsis thoob tias muaj qhov tsis txaus ntseeg tias muaj qee yam kev tsis sib haum xeeb hauv CS teeb. Txawm li cas los xij, Googling thiab tshawb nrhiav qhov pom tseeb tshaj plaws (xws li concurrent_writes lossis memtable_heap_space_in_mb) tsis ua kom nrawm. Nyob rau tib lub sijhawm, cov cav tov huv si thiab tsis cog lus rau dab tsi.

Cov ntaub ntawv tau muab faib sib npaug ntawm cov nodes, cov txheeb cais los ntawm tag nrho cov nodes yog kwv yees li qub.

Qhov no yog dab tsi cov lus txheeb cais zoo li los ntawm ib qho ntawm cov nodesKeyspace: ks
Qauv Zauv: 9383707
Nyeem Latency: 0.04287025042448576 ms
Hnub tim: 15462012
Sau Latency: 0.1350068438699957 ms
Pending Flushes: 0
Tab: t1
SSTable suav: 16
Qhov chaw siv (nyob): 148.59 MiB
Qhov chaw siv (tag nrho): 148.59 MiB
Qhov chaw siv los ntawm snapshots (tag nrho): 0 bytes
Off heap nco siv (tag nrho): 5.17 MiB
SSTable Compression piv: 0.5720989576459437
Cov naj npawb ntawm cov partitions (kwv yees): 3970323
Memtable cell suav: 0
Memtable cov ntaub ntawv loj: 0 bytes
Memtable off heap nco siv: 0 bytes
Memtable hloov suav: 5
Tag Nrho Npe: 2346045
Local nyeem latency: NaN ms
PIB: 3865503
Local sau latency: NaN ms
Pem hauv ntej flushes: 0
Kev kho feem pua: 0.0
Bloom lim qhov tsis zoo: 25
Bloom lim cuav piv: 0.00000
Bloom lim chaw siv: 4.57 MiB
Bloom lim tawm heap nco siv: 4.57 MiB
Index summary off heap nco siv: 590.02 KiB
Compression metadata tawm heap nco siv: 19.45 KiB
Compacted partition yam tsawg kawg nkaus bytes: 36
Compacted muab faib siab tshaj bytes: 42
Compacted muab faib txhais tau tias bytes: 42
Nruab nrab ntawm cov cell nyob ib daim (XNUMX feeb kawg): NaN
Qhov siab tshaj plaws nyob hauv ib daim (0 feeb kawg): XNUMX
Qhov nruab nrab tombstones ib daim (tsawg tsib feeb): NaN
Qhov siab tshaj plaws tombstones ib daim (0 feeb kawg): XNUMX
Kev hloov pauv poob: 0 bytes

Ib qho kev sim txo qhov loj ntawm cov batch (txawm tias xa nws tus kheej) tsis muaj txiaj ntsig, nws tsuas yog zuj zus. Nws yog qhov ua tau tias qhov no yog qhov ua tau zoo tshaj plaws rau CS, txij li cov txiaj ntsig tau txais rau CS zoo ib yam li cov tau txais rau DataStax - txog ntau pua txhiab tus haujlwm ib ob. Tsis tas li ntawd, yog tias peb saib ntawm kev siv cov peev txheej, peb yuav pom tias CS siv ntau ntau CPU thiab disks:

Sib ntaus sib tua ntawm ob yakozuna, los yog Cassandra vs HBase. Sberbank pab pawg kev paub
Daim duab qhia txog kev siv thaum lub sij hawm khiav ntawm tag nrho cov kev xeem nyob rau hauv ib kab rau ob lub databases.

Hais txog HB lub zog nyeem ntawv kom zoo dua. Ntawm no koj tuaj yeem pom tias rau ob qho tib si databases, kev siv disk thaum nyeem ntawv yog qhov tsawg heev (nyeem cov ntawv xeem yog qhov kawg ntawm lub voj voog ntawm kev sim rau txhua lub database, piv txwv li rau CS qhov no yog los ntawm 15:20 txog 15:40). Nyob rau hauv rooj plaub ntawm HB, qhov laj thawj yog qhov tseeb - feem ntau ntawm cov ntaub ntawv dai hauv nco, hauv memstore, thiab qee qhov yog cached hauv blockcache. Raws li rau CS, nws tsis paub meej tias nws ua haujlwm li cas, tab sis kev rov ua dua disk kuj tsis pom, tab sis tsuas yog nyob rau hauv rooj plaub, ib qho kev sim ua kom lub cache row_cache_size_in_mb = 2048 thiab teeb caching = {'keys': 'ALL', 'rows_per_partition': '2000000'}, tab sis qhov ntawd ua rau nws txawm me ntsis zuj zus.

Nws tseem tsim nyog hais dua ib qho tseem ceeb ntawm cov cheeb tsam hauv HB. Nyob rau hauv peb cov ntaub ntawv, tus nqi tau teev raws li 64. Yog tias koj txo nws thiab ua kom nws sib npaug, piv txwv li, 4, ces thaum nyeem ntawv, qhov ceev poob los ntawm 2 zaug. Qhov laj thawj yog tias memstore yuav sau sai dua thiab cov ntaub ntawv yuav raug yaug ntau zaus thiab thaum nyeem ntawv, ntau cov ntaub ntawv yuav tsum tau ua tiav, uas yog kev ua haujlwm nyuaj rau HB. Hauv cov xwm txheej tiag tiag, qhov no tuaj yeem kho tau los ntawm kev xav los ntawm presplitting thiab compactification lub tswv yim; tshwj xeeb tshaj yog, peb siv tus kheej sau cov nqi hluav taws xob uas khaws cov khib nyiab thiab compresses HFiles tas li hauv keeb kwm yav dhau. Nws yog qhov ua tau heev uas rau kev xeem DataStax lawv tau faib tsuas yog 1 cheeb tsam ib lub rooj (uas tsis yog) thiab qhov no yuav qhia me ntsis vim li cas HB thiaj li qis dua hauv lawv cov kev xeem nyeem.

Cov lus xaus ua ntej hauv qab no yog kos los ntawm qhov no. Piv txwv tias tsis muaj qhov yuam kev loj thaum kuaj, ces Cassandra zoo li lub colossus nrog taw ntawm av nplaum. Ntau qhov tseeb, thaum nws sib npaug ntawm ib ceg, zoo li hauv daim duab thaum pib ntawm tsab xov xwm, nws qhia tau zoo heev, tab sis nyob rau hauv kev sib ntaus sib tua nyob rau hauv tib txoj kev nws poob outright. Nyob rau tib lub sijhawm, suav nrog kev siv CPU tsawg ntawm peb cov khoom siv kho vajtse, peb tau kawm cog ob lub RegionServer HBs ib tus tswv tsev thiab yog li ua ob npaug ntawm qhov kev ua tau zoo. Cov. Nrog rau kev siv cov peev txheej, qhov xwm txheej rau CS yog qhov tsis txaus ntseeg.

Tau kawg, cov kev ntsuam xyuas no yog cov khoom hluavtaws heev thiab cov ntaub ntawv uas tau siv ntawm no yog qhov me me. Nws muaj peev xwm hais tias yog tias peb hloov mus rau terabytes, qhov xwm txheej yuav txawv, tab sis thaum rau HB peb tuaj yeem thauj khoom terabytes, rau CS qhov no ua rau muaj teeb meem. Nws feem ntau cuam tshuam rau OperationTimedOutException txawm nrog cov ntim no, txawm hais tias cov kev txwv rau kev tos cov lus teb twb tau nce ntau zaus piv rau cov khoom qub.

Kuv vam tias los ntawm kev sib koom tes peb yuav pom cov kev tsis sib haum xeeb ntawm CS thiab yog tias peb tuaj yeem ua kom nws nrawm dua, tom qab ntawv kawg kuv yuav twv yuav raug hu ntxiv cov ntaub ntawv hais txog cov txiaj ntsig kawg.

UPD: Ua tsaug rau cov lus qhia ntawm cov phooj ywg, kuv tau tswj xyuas kom ceev cov ntawv nyeem. Yog:
159 ops (644 rooj, 4 kwj, batch 5).
Ntxiv:
.withLoadBalancingPolicy(new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().build()))
Thiab kuv ua si ib ncig ntawm cov xov tooj. Qhov tshwm sim yog cov hauv qab no:
4 lub rooj, 100 threads, batch = 1 (daim los ntawm daim): 301 ops
4 lub rooj, 100 threads, batch = 10: 447 ops
4 lub rooj, 100 threads, batch = 100: 625 ops

Tom qab ntawd kuv yuav siv lwm cov lus qhia tuning, khiav lub voj voog tag nrho thiab ntxiv cov txiaj ntsig ntawm qhov kawg ntawm kev tshaj tawm.

Tau qhov twg los: www.hab.com

Ntxiv ib saib