èšäºã®ç¿»èš³ã¯ã³ãŒã¹ã®åŠçåãã«ç¹å¥ã«çšæãããŸãã
XNUMX幎åã«ç§ã¯éãããŸãã
ClickHouse ã¯ããµãŒãããŒã㣠ã©ã€ãã©ãªãé€ã 170 äžè¡ã® C++ ã³ãŒãã§æ§æãããŠãããåæ£ããŒã¿ããŒã¹ ã³ãŒãããŒã¹ã®äžã§ãæå°ã® 235 ã€ã§ãã ããã«å¯ŸããSQLite ã¯ãã£ã¹ããªãã¥ãŒã·ã§ã³ããµããŒãããŠãããã207 äž XNUMX è¡ã® C ã³ãŒãã§æ§æãããŠããŸããããã®èšäºã®å·çæç¹ã§ XNUMX 人ã®ãšã³ãžãã¢ã ClickHouse ã«è²¢ç®ããŠãããæè¿ã³ãããã®å¯åºŠãé«ãŸã£ãŠããŸãã
ã¯ãªãã¯ããŠã¹ã¯2017幎XNUMXæããã
ãã®èšäºã§ã¯ã2 ã³ã¢ ããã»ããµãš NVMe ã¹ãã¬ãŒãžã䜿çšãã AWS EC36 äžã® ClickHouse ã¯ã©ã¹ã¿ãŒã®ããã©ãŒãã³ã¹ãèŠãŠãããŸãã
æŽæ°: ãã®æçš¿ãæåã«å ¬éããŠãã XNUMX é±éåŸãæ§æãæ¹åããŠãã¹ããåå®è¡ããããè¯ãçµæãåŸãããŸããã ãã®æçš¿ã¯ããããã®å€æŽãåæ ããŠæŽæ°ãããŸããã
AWS EC2 ã¯ã©ã¹ã¿ãŒã®èµ·å
ãã®æçš¿ã§ã¯ 5 ã€ã® c9d.2xlarge EC36 ã€ã³ã¹ã¿ã³ã¹ã䜿çšããŸãã ããããã« 72 åã®ä»®æ³ CPUã900 GB ã® RAMã10 GB ã® NVMe SSD ã¹ãã¬ãŒãžãå«ãŸããŠããã1,962 ã®ã¬ããã ãããã¯ãŒã¯ããµããŒãããŸãã ãªã³ããã³ãã§å®è¡ããå Žåãeu-west-1 ãªãŒãžã§ã³ã§ã¯ãããã 16.04 ãã«/æéã®æéãããããŸãã ãªãã¬ãŒãã£ã³ã° ã·ã¹ãã ãšã㊠Ubuntu Server XNUMX LTS ã䜿çšããŸãã
ãã¡ã€ã¢ãŠã©ãŒã«ã¯ãåãã·ã³ãå¶éãªãçžäºã«éä¿¡ã§ããããã«æ§æãããŠãããç§ã® IPv4 ã¢ãã¬ã¹ã®ã¿ãã¯ã©ã¹ã¿ãŒå ã® SSH ã«ãã£ãŠãã¯ã€ããªã¹ãã«ç»é²ãããŠããŸãã
NVMe ãã©ã€ãã¯åäœæºåå®äºç¶æ ã«ãããŸã
ClickHouse ãåäœããããã«ãåãµãŒããŒã® NVMe ãã©ã€ãã« EXT4 圢åŒã®ãã¡ã€ã« ã·ã¹ãã ãäœæããŸãã
$ sudo mkfs -t ext4 /dev/nvme1n1
$ sudo mkdir /ch
$ sudo mount /dev/nvme1n1 /ch
ãã¹ãŠã®æ§æãå®äºãããšãããŠã³ã ãã€ã³ããšåã·ã¹ãã ã§å©çšå¯èœãª 783 GB ã®ã¹ããŒã¹ã衚瀺ãããŸãã
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 87.9M 1 loop /snap/core/5742
loop1 7:1 0 16.5M 1 loop /snap/amazon-ssm-agent/784
nvme0n1 259:1 0 8G 0 disk
âânvme0n1p1 259:2 0 8G 0 part /
nvme1n1 259:0 0 838.2G 0 disk /ch
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 35G 0 35G 0% /dev
tmpfs 6.9G 8.8M 6.9G 1% /run
/dev/nvme0n1p1 7.7G 967M 6.8G 13% /
tmpfs 35G 0 35G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 35G 0 35G 0% /sys/fs/cgroup
/dev/loop0 88M 88M 0 100% /snap/core/5742
/dev/loop1 17M 17M 0 100% /snap/amazon-ssm-agent/784
tmpfs 6.9G 0 6.9G 0% /run/user/1000
/dev/nvme1n1 825G 73M 783G 1% /ch
ãã®ãã¹ãã§äœ¿çšããããŒã¿ã»ããã¯ã1.1 幎éã«ãã¥ãŒãšãŒã¯åžã§è¡ããã XNUMX ååã®ã¿ã¯ã·ãŒä¹è»ããçæããããŒã¿ ãã³ãã§ãã ããã°ã§
$ sudo apt update
$ sudo apt install awscli
$ aws configure
ãã¡ã€ã«ãããã©ã«ãèšå®ãããéãããŠã³ããŒããããããã«ãã¯ã©ã€ã¢ã³ãã®åæãªã¯ãšã¹ãå¶éã 100 ã«èšå®ããŸãã
$ aws configure set
default.s3.max_concurrent_requests
100
AWS S3 ããã¿ã¯ã·ãŒä¹è»ããŒã¿ã»ãããããŠã³ããŒãããæåã®ãµãŒããŒã® NVMe ãã©ã€ãã«ä¿åããŸãã ãã®ããŒã¿ã»ããã¯ãGZIP å§çž®ããã CSV 圢åŒã§çŽ 104 GB ã§ãã
$ sudo mkdir -p /ch/csv
$ sudo chown -R ubuntu /ch/csv
$ aws s3 sync s3://<bucket>/csv /ch/csv
ã¯ãªãã¯ããŠã¹ã®ã€ã³ã¹ããŒã«
Apache ZooKeeper ãå®è¡ããããã«å¿ èŠãªãããJava 8 çšã® OpenJDK ãã£ã¹ããªãã¥ãŒã·ã§ã³ãã€ã³ã¹ããŒã«ããŸããApache ZooKeeper ã¯ãXNUMX å°ã®ãã·ã³ãã¹ãŠã« ClickHouse ãåæ£ã€ã³ã¹ããŒã«ããããã«å¿ èŠã§ãã
$ sudo apt update
$ sudo apt install
openjdk-8-jre
openjdk-8-jdk-headless
ÐаÑеЌ Ñ ÑÑÑÐ°ÐœÐ°Ð²Ð»ÐžÐ²Ð°Ñ Ð¿ÐµÑеЌеММÑÑ ÑÑÐµÐŽÑ JAVA_HOME
.
$ sudo vi /etc/profile
export JAVA_HOME=/usr
$ source /etc/profile
次ã«ãUbuntu ã®ããã±ãŒãžç®¡çã·ã¹ãã ã䜿çšããŠãClickHouse 18.16.1ãglanceãããã³ ZooKeeper ã XNUMX å°ãã¹ãŠã®ãã·ã³ã«ã€ã³ã¹ããŒã«ããŸãã
$ sudo apt-key adv
--keyserver hkp://keyserver.ubuntu.com:80
--recv E0C56BD4
$ echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" |
sudo tee /etc/apt/sources.list.d/clickhouse.list
$ sudo apt-get update
$ sudo apt install
clickhouse-client
clickhouse-server
glances
zookeeperd
ClickHouse çšã®ãã£ã¬ã¯ããªãäœæããXNUMX ã€ã®ãµãŒããŒãã¹ãŠã§æ§æã®äžæžããè¡ããŸãã
$ sudo mkdir /ch/clickhouse
$ sudo chown -R clickhouse /ch/clickhouse
$ sudo mkdir -p /etc/clickhouse-server/conf.d
$ sudo vi /etc/clickhouse-server/conf.d/taxis.conf
ãããã¯ãç§ã䜿çšããæ§æã®ãªãŒããŒã©ã€ãã§ãã
<?xml version="1.0"?>
<yandex>
<listen_host>0.0.0.0</listen_host>
<path>/ch/clickhouse/</path>
<remote_servers>
<perftest_3shards>
<shard>
<replica>
<host>172.30.2.192</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>172.30.2.162</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>172.30.2.36</host>
<port>9000</port>
</replica>
</shard>
</perftest_3shards>
</remote_servers>
<zookeeper-servers>
<node>
<host>172.30.2.192</host>
<port>2181</port>
</node>
<node>
<host>172.30.2.162</host>
<port>2181</port>
</node>
<node>
<host>172.30.2.36</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<shard>03</shard>
<replica>01</replica>
</macros>
</yandex>
次ã«ãXNUMX å°ãã¹ãŠã®ãã·ã³ã§ ZooKeeper ãš ClickHouse ãµãŒããŒãå®è¡ããŸãã
$ sudo /etc/init.d/zookeeper start
$ sudo service clickhouse-server start
ClickHouseãžã®ããŒã¿ã®ã¢ããããŒã
æåã®ãµãŒããŒã§ããªãã ããŒãã«ãäœæããŸã (trips
)ããã° ãšã³ãžã³ã䜿çšããŠã¿ã¯ã·ãŒæ
è¡ã®ããŒã¿ã»ãããä¿åããŸãã
$ clickhouse-client --host=0.0.0.0
CREATE TABLE trips (
trip_id UInt32,
vendor_id String,
pickup_datetime DateTime,
dropoff_datetime Nullable(DateTime),
store_and_fwd_flag Nullable(FixedString(1)),
rate_code_id Nullable(UInt8),
pickup_longitude Nullable(Float64),
pickup_latitude Nullable(Float64),
dropoff_longitude Nullable(Float64),
dropoff_latitude Nullable(Float64),
passenger_count Nullable(UInt8),
trip_distance Nullable(Float64),
fare_amount Nullable(Float32),
extra Nullable(Float32),
mta_tax Nullable(Float32),
tip_amount Nullable(Float32),
tolls_amount Nullable(Float32),
ehail_fee Nullable(Float32),
improvement_surcharge Nullable(Float32),
total_amount Nullable(Float32),
payment_type Nullable(String),
trip_type Nullable(UInt8),
pickup Nullable(String),
dropoff Nullable(String),
cab_type Nullable(String),
precipitation Nullable(Int8),
snow_depth Nullable(Int8),
snowfall Nullable(Int8),
max_temperature Nullable(Int8),
min_temperature Nullable(Int8),
average_wind_speed Nullable(Int8),
pickup_nyct2010_gid Nullable(Int8),
pickup_ctlabel Nullable(String),
pickup_borocode Nullable(Int8),
pickup_boroname Nullable(String),
pickup_ct2010 Nullable(String),
pickup_boroct2010 Nullable(String),
pickup_cdeligibil Nullable(FixedString(1)),
pickup_ntacode Nullable(String),
pickup_ntaname Nullable(String),
pickup_puma Nullable(String),
dropoff_nyct2010_gid Nullable(UInt8),
dropoff_ctlabel Nullable(String),
dropoff_borocode Nullable(UInt8),
dropoff_boroname Nullable(String),
dropoff_ct2010 Nullable(String),
dropoff_boroct2010 Nullable(String),
dropoff_cdeligibil Nullable(String),
dropoff_ntacode Nullable(String),
dropoff_ntaname Nullable(String),
dropoff_puma Nullable(String)
) ENGINE = Log;
次ã«ãå CSV ãã¡ã€ã«ãæœåºããŠããªãã ããŒãã«ã«ããŒãããŸã (trips
ïŒã 以äžã¯ 55 å 10 ç§ã§å®äºããŸããã ãã®æäœåŸã®ããŒã¿ ãã£ã¬ã¯ããªã®ãµã€ãºã¯ 134 GB ã«ãªããŸããã
$ time (for FILENAME in /ch/csv/trips_x*.csv.gz; do
echo $FILENAME
gunzip -c $FILENAME |
clickhouse-client
--host=0.0.0.0
--query="INSERT INTO trips FORMAT CSV"
done)
ã€ã³ããŒãé床ã¯ãéå§çž® CSV ã³ã³ãã³ã㧠155 ç§ããã XNUMX MB ã§ããã ãã㯠GZIP 解åã®ããã«ããã¯ãåå ã ã£ããšæãããŸãã xargs ã䜿çšã㊠gzip å§çž®ããããã¹ãŠã®ãã¡ã€ã«ã䞊è¡ããŠè§£åãã解åãããããŒã¿ãããŒãããæ¹ãé«éã ã£ããããããŸããã 以äžã¯ãCSV ã€ã³ããŒã ããã»ã¹äžã«å ±åãããå 容ã®èª¬æã§ãã
$ sudo glances
ip-172-30-2-200 (Ubuntu 16.04 64bit / Linux 4.4.0-1072-aws) Uptime: 0:11:42
CPU 8.2% nice: 0.0% LOAD 36-core MEM 9.8% active: 5.20G SWAP 0.0%
user: 6.0% irq: 0.0% 1 min: 2.24 total: 68.7G inactive: 61.0G total: 0
system: 0.9% iowait: 1.3% 5 min: 1.83 used: 6.71G buffers: 66.4M used: 0
idle: 91.8% steal: 0.0% 15 min: 1.01 free: 62.0G cached: 61.6G free: 0
NETWORK Rx/s Tx/s TASKS 370 (507 thr), 2 run, 368 slp, 0 oth sorted automatically by cpu_percent, flat view
ens5 136b 2Kb
lo 343Mb 343Mb CPU% MEM% VIRT RES PID USER NI S TIME+ IOR/s IOW/s Command
100.4 1.5 1.65G 1.06G 9909 ubuntu 0 S 1:01.33 0 0 clickhouse-client --host=0.0.0.0 --query=INSERT INTO trips FORMAT CSV
DISK I/O R/s W/s 85.1 0.0 4.65M 708K 9908 ubuntu 0 R 0:50.60 32M 0 gzip -d -c /ch/csv/trips_xac.csv.gz
loop0 0 0 54.9 5.1 8.14G 3.49G 8091 clickhous 0 S 1:44.23 0 45M /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
loop1 0 0 4.5 0.0 0 0 319 root 0 S 0:07.50 1K 0 kworker/u72:2
nvme0n1 0 3K 2.3 0.0 91.1M 28.9M 9912 root 0 R 0:01.56 0 0 /usr/bin/python3 /usr/bin/glances
nvme0n1p1 0 3K 0.3 0.0 0 0 960 root -20 S 0:00.10 0 0 kworker/28:1H
nvme1n1 32.1M 495M 0.3 0.0 0 0 1058 root -20 S 0:00.90 0 0 kworker/23:1H
ç¶è¡ããåã«ãå ã® CSV ãã¡ã€ã«ãåé€ã㊠NVMe ãã©ã€ãã®ã¹ããŒã¹ã解æŸããŸãã
$ sudo rm -fr /ch/csv
å圢åŒã«å€æ
Log ClickHouse ãšã³ãžã³ã¯ããŒã¿ãè¡æå圢åŒã§ä¿åããŸãã ããŒã¿ãããé«éã«ã¯ãšãªããããã«ãMergeTree ãšã³ãžã³ã䜿çšããŠããŒã¿ãåæå圢åŒã«å€æããŸãã
$ clickhouse-client --host=0.0.0.0
以äžã¯ 34 å 50 ç§ã§å®äºããŸããã ãã®æäœåŸã®ããŒã¿ ãã£ã¬ã¯ããªã®ãµã€ãºã¯ 237 GB ã«ãªããŸããã
CREATE TABLE trips_mergetree
ENGINE = MergeTree(pickup_date, pickup_datetime, 8192)
AS SELECT
trip_id,
CAST(vendor_id AS Enum8('1' = 1,
'2' = 2,
'CMT' = 3,
'VTS' = 4,
'DDS' = 5,
'B02512' = 10,
'B02598' = 11,
'B02617' = 12,
'B02682' = 13,
'B02764' = 14)) AS vendor_id,
toDate(pickup_datetime) AS pickup_date,
ifNull(pickup_datetime, toDateTime(0)) AS pickup_datetime,
toDate(dropoff_datetime) AS dropoff_date,
ifNull(dropoff_datetime, toDateTime(0)) AS dropoff_datetime,
assumeNotNull(store_and_fwd_flag) AS store_and_fwd_flag,
assumeNotNull(rate_code_id) AS rate_code_id,
assumeNotNull(pickup_longitude) AS pickup_longitude,
assumeNotNull(pickup_latitude) AS pickup_latitude,
assumeNotNull(dropoff_longitude) AS dropoff_longitude,
assumeNotNull(dropoff_latitude) AS dropoff_latitude,
assumeNotNull(passenger_count) AS passenger_count,
assumeNotNull(trip_distance) AS trip_distance,
assumeNotNull(fare_amount) AS fare_amount,
assumeNotNull(extra) AS extra,
assumeNotNull(mta_tax) AS mta_tax,
assumeNotNull(tip_amount) AS tip_amount,
assumeNotNull(tolls_amount) AS tolls_amount,
assumeNotNull(ehail_fee) AS ehail_fee,
assumeNotNull(improvement_surcharge) AS improvement_surcharge,
assumeNotNull(total_amount) AS total_amount,
assumeNotNull(payment_type) AS payment_type_,
assumeNotNull(trip_type) AS trip_type,
pickup AS pickup,
pickup AS dropoff,
CAST(assumeNotNull(cab_type)
AS Enum8('yellow' = 1, 'green' = 2))
AS cab_type,
precipitation AS precipitation,
snow_depth AS snow_depth,
snowfall AS snowfall,
max_temperature AS max_temperature,
min_temperature AS min_temperature,
average_wind_speed AS average_wind_speed,
pickup_nyct2010_gid AS pickup_nyct2010_gid,
pickup_ctlabel AS pickup_ctlabel,
pickup_borocode AS pickup_borocode,
pickup_boroname AS pickup_boroname,
pickup_ct2010 AS pickup_ct2010,
pickup_boroct2010 AS pickup_boroct2010,
pickup_cdeligibil AS pickup_cdeligibil,
pickup_ntacode AS pickup_ntacode,
pickup_ntaname AS pickup_ntaname,
pickup_puma AS pickup_puma,
dropoff_nyct2010_gid AS dropoff_nyct2010_gid,
dropoff_ctlabel AS dropoff_ctlabel,
dropoff_borocode AS dropoff_borocode,
dropoff_boroname AS dropoff_boroname,
dropoff_ct2010 AS dropoff_ct2010,
dropoff_boroct2010 AS dropoff_boroct2010,
dropoff_cdeligibil AS dropoff_cdeligibil,
dropoff_ntacode AS dropoff_ntacode,
dropoff_ntaname AS dropoff_ntaname,
dropoff_puma AS dropoff_puma
FROM trips;
æäœäžã®ã°ã©ã³ã¹åºåã¯æ¬¡ã®ããã«ãªããŸãã
ip-172-30-2-200 (Ubuntu 16.04 64bit / Linux 4.4.0-1072-aws) Uptime: 1:06:09
CPU 10.3% nice: 0.0% LOAD 36-core MEM 16.1% active: 13.3G SWAP 0.0%
user: 7.9% irq: 0.0% 1 min: 1.87 total: 68.7G inactive: 52.8G total: 0
system: 1.6% iowait: 0.8% 5 min: 1.76 used: 11.1G buffers: 71.8M used: 0
idle: 89.7% steal: 0.0% 15 min: 1.95 free: 57.6G cached: 57.2G free: 0
NETWORK Rx/s Tx/s TASKS 367 (523 thr), 1 run, 366 slp, 0 oth sorted automatically by cpu_percent, flat view
ens5 1Kb 8Kb
lo 2Kb 2Kb CPU% MEM% VIRT RES PID USER NI S TIME+ IOR/s IOW/s Command
241.9 12.8 20.7G 8.78G 8091 clickhous 0 S 30:36.73 34M 125M /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
DISK I/O R/s W/s 2.6 0.0 90.4M 28.3M 9948 root 0 R 1:18.53 0 0 /usr/bin/python3 /usr/bin/glances
loop0 0 0 1.3 0.0 0 0 203 root 0 S 0:09.82 0 0 kswapd0
loop1 0 0 0.3 0.1 315M 61.3M 15701 ubuntu 0 S 0:00.40 0 0 clickhouse-client --host=0.0.0.0
nvme0n1 0 3K 0.3 0.0 0 0 7 root 0 S 0:00.83 0 0 rcu_sched
nvme0n1p1 0 3K 0.0 0.0 0 0 142 root 0 S 0:00.22 0 0 migration/27
nvme1n1 25.8M 330M 0.0 0.0 59.7M 1.79M 2764 ubuntu 0 S 0:00.00 0 0 (sd-pam)
æåŸã®ãã¹ãã§ã¯ãããã€ãã®åãå€æãããåèšç®ãããŸããã ãããã®é¢æ°ã®äžéšãããã®ããŒã¿ã»ããã§ã¯æåŸ ã©ããã«åäœããªããªã£ãŠããããšãããããŸããã ãã®åé¡ã解決ããããã«ãäžé©åãªé¢æ°ãåé€ãããã詳现ãªåã«å€æããã«ããŒã¿ãããŒãããŸããã
ã¯ã©ã¹ã¿ãŒå šäœã§ã®ããŒã¿ã®åæ£
XNUMX ã€ã®ã¯ã©ã¹ã¿ãŒ ããŒããã¹ãŠã«ããŒã¿ãåæ£ããŸãã ãŸãã以äžã§ XNUMX å°ã®ãã·ã³ãã¹ãŠã«ããŒãã«ãäœæããŸãã
$ clickhouse-client --host=0.0.0.0
CREATE TABLE trips_mergetree_third (
trip_id UInt32,
vendor_id String,
pickup_date Date,
pickup_datetime DateTime,
dropoff_date Date,
dropoff_datetime Nullable(DateTime),
store_and_fwd_flag Nullable(FixedString(1)),
rate_code_id Nullable(UInt8),
pickup_longitude Nullable(Float64),
pickup_latitude Nullable(Float64),
dropoff_longitude Nullable(Float64),
dropoff_latitude Nullable(Float64),
passenger_count Nullable(UInt8),
trip_distance Nullable(Float64),
fare_amount Nullable(Float32),
extra Nullable(Float32),
mta_tax Nullable(Float32),
tip_amount Nullable(Float32),
tolls_amount Nullable(Float32),
ehail_fee Nullable(Float32),
improvement_surcharge Nullable(Float32),
total_amount Nullable(Float32),
payment_type Nullable(String),
trip_type Nullable(UInt8),
pickup Nullable(String),
dropoff Nullable(String),
cab_type Nullable(String),
precipitation Nullable(Int8),
snow_depth Nullable(Int8),
snowfall Nullable(Int8),
max_temperature Nullable(Int8),
min_temperature Nullable(Int8),
average_wind_speed Nullable(Int8),
pickup_nyct2010_gid Nullable(Int8),
pickup_ctlabel Nullable(String),
pickup_borocode Nullable(Int8),
pickup_boroname Nullable(String),
pickup_ct2010 Nullable(String),
pickup_boroct2010 Nullable(String),
pickup_cdeligibil Nullable(FixedString(1)),
pickup_ntacode Nullable(String),
pickup_ntaname Nullable(String),
pickup_puma Nullable(String),
dropoff_nyct2010_gid Nullable(UInt8),
dropoff_ctlabel Nullable(String),
dropoff_borocode Nullable(UInt8),
dropoff_boroname Nullable(String),
dropoff_ct2010 Nullable(String),
dropoff_boroct2010 Nullable(String),
dropoff_cdeligibil Nullable(String),
dropoff_ntacode Nullable(String),
dropoff_ntaname Nullable(String),
dropoff_puma Nullable(String)
) ENGINE = MergeTree(pickup_date, pickup_datetime, 8192);
次ã«ãæåã®ãµãŒããŒãã¯ã©ã¹ã¿ãŒå ã® XNUMX ã€ã®ããŒããã¹ãŠãèªèã§ããããšã確èªããŸãã
SELECT *
FROM system.clusters
WHERE cluster = 'perftest_3shards'
FORMAT Vertical;
Row 1:
ââââââ
cluster: perftest_3shards
shard_num: 1
shard_weight: 1
replica_num: 1
host_name: 172.30.2.192
host_address: 172.30.2.192
port: 9000
is_local: 1
user: default
default_database:
Row 2:
ââââââ
cluster: perftest_3shards
shard_num: 2
shard_weight: 1
replica_num: 1
host_name: 172.30.2.162
host_address: 172.30.2.162
port: 9000
is_local: 0
user: default
default_database:
Row 3:
ââââââ
cluster: perftest_3shards
shard_num: 3
shard_weight: 1
replica_num: 1
host_name: 172.30.2.36
host_address: 172.30.2.36
port: 9000
is_local: 0
user: default
default_database:
次ã«ãã¹ããŒãã«åºã¥ããŠæåã®ãµãŒããŒäžã«æ°ããããŒãã«ãå®çŸ©ããŸãã trips_mergetree_third
åæ£ãšã³ãžã³ã䜿çšããŸãã
CREATE TABLE trips_mergetree_x3
AS trips_mergetree_third
ENGINE = Distributed(perftest_3shards,
default,
trips_mergetree_third,
rand());
次ã«ãMergeTree ããŒã¹ã®ããŒãã«ãã 34 ã€ã®ãµãŒããŒãã¹ãŠã«ããŒã¿ãã³ããŒããŸãã 以äžã¯ 44 å XNUMX ç§ã§å®äºããŸããã
INSERT INTO trips_mergetree_x3
SELECT * FROM trips_mergetree;
äžèšã®æäœã®åŸãClickHouse ãæ倧ã¹ãã¬ãŒãž ã¬ãã«ã®ããŒã¯ããé¢ãããŸã§ 15 åãäžããŸããã ããŒã¿ ãã£ã¬ã¯ããªã¯ãæçµçã« 264 å°ã®åãµãŒããŒã§ãããã 34 GBã33 GBãXNUMX GB ã«ãªããŸããã
ClickHouse ã¯ã©ã¹ã¿ãŒã®ããã©ãŒãã³ã¹è©äŸ¡
次ã«ç¢ºèªããã®ã¯ãããŒãã«ã«å¯ŸããŠåã¯ãšãªãè€æ°åå®è¡ããå Žåã®æéæéã§ããã trips_mergetree_x3
.
$ clickhouse-client --host=0.0.0.0
以äžã¯ 2.449 ç§ã§å®äºããŸããã
SELECT cab_type, count(*)
FROM trips_mergetree_x3
GROUP BY cab_type;
以äžã¯ 0.691 ç§ã§å®äºããŸããã
SELECT passenger_count,
avg(total_amount)
FROM trips_mergetree_x3
GROUP BY passenger_count;
以äžã¯ 0 ç§ã§å®äºããŸããã
SELECT passenger_count,
toYear(pickup_date) AS year,
count(*)
FROM trips_mergetree_x3
GROUP BY passenger_count,
year;
以äžã¯ 0.983 ç§ã§å®äºããŸããã
SELECT passenger_count,
toYear(pickup_date) AS year,
round(trip_distance) AS distance,
count(*)
FROM trips_mergetree_x3
GROUP BY passenger_count,
year,
distance
ORDER BY year,
count(*) DESC;
æ¯èŒã®ããã«ãæåã®ãµãŒããŒã®ã¿ã«ååšãã MergeTree ããŒã¹ã®ããŒãã«ã«å¯ŸããŠåãã¯ãšãªãå®è¡ããŸããã
XNUMX ã€ã® ClickHouse ããŒãã®ããã©ãŒãã³ã¹è©äŸ¡
次ã«ç¢ºèªããã®ã¯ãããŒãã«ã«å¯ŸããŠåã¯ãšãªãè€æ°åå®è¡ããå Žåã®æéæéã§ããã trips_mergetree_x3
.
以äžã¯ 0.241 ç§ã§å®äºããŸããã
SELECT cab_type, count(*)
FROM trips_mergetree
GROUP BY cab_type;
以äžã¯ 0.826 ç§ã§å®äºããŸããã
SELECT passenger_count,
avg(total_amount)
FROM trips_mergetree
GROUP BY passenger_count;
以äžã¯ 1.209 ç§ã§å®äºããŸããã
SELECT passenger_count,
toYear(pickup_date) AS year,
count(*)
FROM trips_mergetree
GROUP BY passenger_count,
year;
以äžã¯ 1.781 ç§ã§å®äºããŸããã
SELECT passenger_count,
toYear(pickup_date) AS year,
round(trip_distance) AS distance,
count(*)
FROM trips_mergetree
GROUP BY passenger_count,
year,
distance
ORDER BY year,
count(*) DESC;
РазЌÑÑÐ»ÐµÐœÐžÑ ÐŸ ÑезÑлÑÑаÑаÑ
ç§ã®ãã¹ãã«ãããŠãç¡æã® CPU ããŒã¹ã®ããŒã¿ããŒã¹ã GPU ããŒã¹ã®ããŒã¿ããŒã¹ãäžåãããã©ãŒãã³ã¹ã瀺ããã®ã¯ãããåããŠã§ãã ãã以æ¥ãGPU ããŒã¹ã®ããŒã¿ããŒã¹ã¯ XNUMX åã®æ¹èšãçµãŸããããããã§ã ClickHouse ãåäžããŒãã§å®çŸããããã©ãŒãã³ã¹ã¯éåžžã«å°è±¡çã§ãã
åæã«ãåæ£ãšã³ãžã³ã§ã¯ãšãª 1 ãå®è¡ãããšããªãŒããŒããã ã³ã¹ãã 2 æ¡é«ããªããŸãã ã¯ã©ã¹ã¿ãŒã«ããŒããè¿œå ããã«ã€ããŠã¯ãšãªæéãççž®ãããã®ã¯çŽ æŽãããããšãªã®ã§ããã®æçš¿ã®èª¿æ»ã§äœããæ¬ ããŠããããšãé¡ã£ãŠããŸãã ãã ããä»ã®ã¯ãšãªãå®è¡ãããšããã©ãŒãã³ã¹ãçŽXNUMXåã«ãªã£ãã®ã¯çŽ æŽãããããšã§ãã
ClickHouse ãã¹ãã¬ãŒãžãšã³ã³ãã¥ãŒãã£ã³ã°ãåé¢ããŠãç¬ç«ããŠæ¡åŒµã§ããããã«é²åããã®ãèŠãã®ã¯çŽ æŽãããããšã§ãã æšå¹Žè¿œå ããã HDFS ãµããŒãã¯ãããã«åããäžæ©ãšãªãå¯èœæ§ããããŸãã ã³ã³ãã¥ãŒãã£ã³ã°ã«é¢ããŠèšãã°ãã¯ã©ã¹ã¿ãŒã«ããŒããè¿œå ããããšã§åäžã®ã¯ãšãªãé«éåã§ããã°ããã®ãœãããŠã§ã¢ã®å°æ¥ã¯éåžžã«æãããã®ã«ãªããŸãã
ãã®æçš¿ããèªã¿ããã ãããããšãããããŸãã ç§ã¯åç±³ãšãšãŒãããã®ã¯ã©ã€ã¢ã³ãã«ã³ã³ãµã«ãã£ã³ã°ãã¢ãŒããã¯ãã£ããã©ã¯ãã£ã¹éçºãµãŒãã¹ãæäŸããŠããŸãã ç§ã®ææ¡ãããªãã®ããžãã¹ã«ã©ã®ããã«åœ¹ç«ã€ãã«ã€ããŠè©±ãåãããå Žåã¯ã次ã®æ¹æ³ã§ç§ã«é£çµ¡ããŠãã ããã
åºæïŒ habr.com