1.1์–ต ํƒ์‹œ ํƒ€๊ธฐ: 108์ฝ”์–ด ClickHouse ํด๋Ÿฌ์Šคํ„ฐ

์ด ๊ธฐ์‚ฌ์˜ ๋ฒˆ์—ญ์€ ์ฝ”์Šค์˜ ํ•™์ƒ๋“ค์„ ์œ„ํ•ด ํŠน๋ณ„ํžˆ ์ค€๋น„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด.

1.1์–ต ํƒ์‹œ ํƒ€๊ธฐ: 108์ฝ”์–ด ClickHouse ํด๋Ÿฌ์Šคํ„ฐ

ํด๋ฆญ ํ•˜์šฐ์Šค ์˜คํ”ˆ ์†Œ์Šค ์ปฌ๋Ÿผํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ž…๋‹ˆ๋‹ค. ํ•˜๋ฃจ์— ์ˆ˜๋ฐฑ์–ต ๊ฐœ์˜ ์ƒˆ๋กœ์šด ๋ ˆ์ฝ”๋“œ๊ฐ€ ์ž…๋ ฅ๋˜๋Š” ๊ฒฝ์šฐ์—๋„ ์ˆ˜๋ฐฑ ๋ช…์˜ ๋ถ„์„๊ฐ€๊ฐ€ ์„ธ๋ถ€ ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฟผ๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ํ›Œ๋ฅญํ•œ ํ™˜๊ฒฝ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•œ ์ธํ”„๋ผ ๋น„์šฉ์€ ์—ฐ๊ฐ„ ์ตœ๋Œ€ 100๋‹ฌ๋Ÿฌ์— ๋‹ฌํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ์‚ฌ์šฉ๋Ÿ‰์— ๋”ฐ๋ผ ๊ทธ ์ ˆ๋ฐ˜์ด ๋  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๋•Œ Yandex Metrics์˜ ClickHouse ์„ค์น˜์—๋Š” 10์กฐ ๊ฐœ์˜ ๊ธฐ๋ก์ด ํฌํ•จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Yandex ์™ธ์—๋„ ClickHouse๋Š” Bloomberg ๋ฐ Cloudflare์—์„œ๋„ ์„ฑ๊ณต์„ ๊ฑฐ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

XNUMX๋…„ ์ „์— ๋‚˜๋Š” ์ผ๋‹ค. ๋น„๊ต ๋ถ„์„ ํ•˜๋‚˜์˜ ๋จธ์‹ ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ๊ตฌ์ถ•ํ–ˆ์œผ๋ฉฐ, ๊ฐ€์žฅ ๋น ๋ฅธ ๋‚ด๊ฐ€ ๋ณธ ์ ์ด ์žˆ๋Š” ๋ฌด๋ฃŒ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์†Œํ”„ํŠธ์›จ์–ด. ๊ทธ ์ดํ›„๋กœ ๊ฐœ๋ฐœ์ž๋“ค์€ Kafka, HDFS ๋ฐ ZStandard ์••์ถ• ์ง€์›์„ ํฌํ•จํ•œ ๊ธฐ๋Šฅ ์ถ”๊ฐ€๋ฅผ ์ค‘๋‹จํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์ž‘๋…„์— ๊ณ„๋‹จ์‹ ์••์ถ• ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ง€์›์„ ์ถ”๊ฐ€ํ–ˆ์œผ๋ฉฐ ๋ธํƒ€์—์„œ ๋ธํƒ€ ์ฝ”๋”ฉ์ด ๊ฐ€๋Šฅํ•ด์กŒ์Šต๋‹ˆ๋‹ค. ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์••์ถ•ํ•  ๋•Œ ๊ฒŒ์ด์ง€ ๊ฐ’์€ ๋ธํƒ€ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฉด ์ž˜ ์••์ถ•ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์นด์šดํ„ฐ์˜ ๊ฒฝ์šฐ์—๋Š” ๋ธํƒ€๋ณ„ ์ธ์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ์ข‹์Šต๋‹ˆ๋‹ค. ์ข‹์€ ์••์ถ•์ด ClickHouse ์„ฑ๋Šฅ์˜ ํ•ต์‹ฌ์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ClickHouse๋Š” ํƒ€์‚ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ œ์™ธํ•˜๊ณ  170๋งŒ ์ค„์˜ C++ ์ฝ”๋“œ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ ๊ฐ€์žฅ ์ž‘์€ ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ฝ”๋“œ๋ฒ ์ด์Šค ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ์ด์— ๋น„ํ•ด SQLite๋Š” ๋ฐฐํฌ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š์œผ๋ฉฐ 235์ค„์˜ C ์ฝ”๋“œ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ด ๊ธ€์„ ์“ฐ๋Š” ์‹œ์ ์—์„œ 207๋ช…์˜ ์—”์ง€๋‹ˆ์–ด๊ฐ€ ClickHouse์— ๊ธฐ์—ฌํ–ˆ์œผ๋ฉฐ ์ตœ๊ทผ ์ปค๋ฐ‹์˜ ๊ฐ•๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

ํด๋ฆญํ•˜์šฐ์Šค๋Š” 2017๋…„ XNUMX์›”๋ถ€ํ„ฐ ๋ณ€๊ฒฝ ๋กœ๊ทธ ๊ฐœ๋ฐœ ์ƒํ™ฉ์„ ์ถ”์ ํ•˜๋Š” ์‰ฌ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ๋ชจ๋†€๋ฆฌ์‹ ๋ฌธ์„œ ํŒŒ์ผ์„ ๋งˆํฌ๋‹ค์šด ๊ธฐ๋ฐ˜ ํŒŒ์ผ ๊ณ„์ธต ๊ตฌ์กฐ๋กœ ๋ถ„ํ• ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฌธ์ œ์™€ ๊ธฐ๋Šฅ์€ GitHub๋ฅผ ํ†ตํ•ด ์ถ”์ ๋˜๋ฉฐ ์ผ๋ฐ˜์ ์œผ๋กœ ์ง€๋‚œ ๋ช‡ ๋…„ ๋™์•ˆ ์†Œํ”„ํŠธ์›จ์–ด์— ๋Œ€ํ•œ ์ ‘๊ทผ์„ฑ์ด ํ›จ์”ฌ ๋” ๋†’์•„์กŒ์Šต๋‹ˆ๋‹ค.

์ด ๊ธฐ์‚ฌ์—์„œ๋Š” 2์ฝ”์–ด ํ”„๋กœ์„ธ์„œ์™€ NVMe ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋Š” AWS EC36์—์„œ ClickHouse ํด๋Ÿฌ์Šคํ„ฐ์˜ ์„ฑ๋Šฅ์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์—…๋ฐ์ดํŠธ: ์ด ๊ฒŒ์‹œ๋ฌผ์„ ์ฒ˜์Œ ๊ฒŒ์‹œํ•œ ์ง€ ์ผ์ฃผ์ผ ํ›„, ๊ฐœ์„ ๋œ ๊ตฌ์„ฑ์œผ๋กœ ํ…Œ์ŠคํŠธ๋ฅผ ๋‹ค์‹œ ์‹คํ–‰ํ•˜์—ฌ ํ›จ์”ฌ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒŒ์‹œ๋ฌผ์€ ์ด๋Ÿฌํ•œ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ๋ฐ˜์˜ํ•˜์—ฌ ์—…๋ฐ์ดํŠธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

AWS EC2 ํด๋Ÿฌ์Šคํ„ฐ ์‹œ์ž‘

์ด ๊ฒŒ์‹œ๋ฌผ์—์„œ๋Š” 5๊ฐœ์˜ c9d.2xlarge EC36 ์ธ์Šคํ„ด์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๊ฐ๊ฐ์—๋Š” 72๊ฐœ์˜ ๊ฐ€์ƒ CPU, 900GB RAM, 10GB NVMe SSD ์Šคํ† ๋ฆฌ์ง€๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉฐ 1,962๊ธฐ๊ฐ€๋น„ํŠธ ๋„คํŠธ์›Œํฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์˜จ๋””๋งจ๋“œ ์‹คํ–‰ ์‹œ eu-west-1 ๋ฆฌ์ „์—์„œ ์‹œ๊ฐ„๋‹น $16.04์˜ ๋น„์šฉ์ด ๋“ญ๋‹ˆ๋‹ค. ์šด์˜ ์ฒด์ œ๋กœ Ubuntu Server XNUMX LTS๋ฅผ ์‚ฌ์šฉํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฐ ์‹œ์Šคํ…œ์ด ์ œํ•œ ์—†์ด ์„œ๋กœ ํ†ต์‹ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐฉํ™”๋ฒฝ์ด ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ SSH์—์„œ๋Š” ๋‚ด IPv4 ์ฃผ์†Œ๋งŒ ํ™”์ดํŠธ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.

์ž‘๋™ ์ค€๋น„ ์ƒํƒœ์˜ NVMe ๋“œ๋ผ์ด๋ธŒ

ClickHouse๊ฐ€ ์ž‘๋™ํ•˜๋ ค๋ฉด ๊ฐ ์„œ๋ฒ„์˜ NVMe ๋“œ๋ผ์ด๋ธŒ์— EXT4 ํ˜•์‹์˜ ํŒŒ์ผ ์‹œ์Šคํ…œ์„ ์ƒ์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo mkfs -t ext4 /dev/nvme1n1
$ sudo mkdir /ch
$ sudo mount /dev/nvme1n1 /ch

๋ชจ๋“  ๊ฒƒ์ด ๊ตฌ์„ฑ๋˜๋ฉด ๊ฐ ์‹œ์Šคํ…œ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋งˆ์šดํŠธ ์ง€์ ๊ณผ 783GB์˜ ๊ณต๊ฐ„์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

$ lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0         7:0    0  87.9M  1 loop /snap/core/5742
loop1         7:1    0  16.5M  1 loop /snap/amazon-ssm-agent/784
nvme0n1     259:1    0     8G  0 disk
โ””โ”€nvme0n1p1 259:2    0     8G  0 part /
nvme1n1     259:0    0 838.2G  0 disk /ch

$ df -h

Filesystem      Size  Used Avail Use% Mounted on
udev             35G     0   35G   0% /dev
tmpfs           6.9G  8.8M  6.9G   1% /run
/dev/nvme0n1p1  7.7G  967M  6.8G  13% /
tmpfs            35G     0   35G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            35G     0   35G   0% /sys/fs/cgroup
/dev/loop0       88M   88M     0 100% /snap/core/5742
/dev/loop1       17M   17M     0 100% /snap/amazon-ssm-agent/784
tmpfs           6.9G     0  6.9G   0% /run/user/1000
/dev/nvme1n1    825G   73M  783G   1% /ch

์ด ํ…Œ์ŠคํŠธ์—์„œ ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” 1.1๋…„ ๋™์•ˆ ๋‰ด์š•์‹œ์—์„œ XNUMX์–ต ๋ฒˆ์˜ ํƒ์‹œ ํƒ‘์Šน์„ ํ†ตํ•ด ์ƒ์„ฑํ•œ ๋ฐ์ดํ„ฐ ๋คํ”„์ž…๋‹ˆ๋‹ค. ๋ธ”๋กœ๊ทธ์—์„œ Redshift์—์„œ XNUMX์–ต ๊ฑด์˜ ํƒ์‹œ ์—ฌํ–‰ ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์–ด๋–ป๊ฒŒ ์ˆ˜์ง‘ํ–ˆ๋Š”์ง€ ์ž์„ธํžˆ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” AWS S3์— ์ €์žฅ๋˜๋ฏ€๋กœ ์•ก์„ธ์Šค ํ‚ค์™€ ๋น„๋ฐ€ ํ‚ค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ AWS CLI๋ฅผ ๊ตฌ์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo apt update
$ sudo apt install awscli
$ aws configure

ํŒŒ์ผ์ด ๊ธฐ๋ณธ ์„ค์ •๋ณด๋‹ค ๋น ๋ฅด๊ฒŒ ๋‹ค์šด๋กœ๋“œ๋˜๋„๋ก ํด๋ผ์ด์–ธํŠธ์˜ ๋™์‹œ ์š”์ฒญ ์ œํ•œ์„ 100์œผ๋กœ ์„ค์ •ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ aws configure set 
    default.s3.max_concurrent_requests 
    100

AWS S3์—์„œ ํƒ์‹œ ํƒ‘์Šน ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ์ฒซ ๋ฒˆ์งธ ์„œ๋ฒ„์˜ NVMe ๋“œ๋ผ์ด๋ธŒ์— ์ €์žฅํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋Š” GZIP ์••์ถ• CSV ํ˜•์‹์œผ๋กœ ์ตœ๋Œ€ 104GB์ž…๋‹ˆ๋‹ค.

$ sudo mkdir -p /ch/csv
$ sudo chown -R ubuntu /ch/csv
$ aws s3 sync s3://<bucket>/csv /ch/csv

ํด๋ฆญํ•˜์šฐ์Šค ์„ค์น˜

์„ธ ์‹œ์Šคํ…œ ๋ชจ๋‘์—์„œ ClickHouse๋ฅผ ๋ถ„์‚ฐ ์„ค์น˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ Apache ZooKeeper๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•˜๋ฏ€๋กœ Java 8์šฉ OpenJDK ๋ฐฐํฌํŒ์„ ์„ค์น˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo apt update
$ sudo apt install 
    openjdk-8-jre 
    openjdk-8-jdk-headless

๊ทธ๋Ÿฐ ๋‹ค์Œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ–ˆ์Šต๋‹ˆ๋‹ค. JAVA_HOME.

$ sudo vi /etc/profile
 
export JAVA_HOME=/usr
 
$ source /etc/profile

๊ทธ๋Ÿฐ ๋‹ค์Œ Ubuntu์˜ ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ธ ์‹œ์Šคํ…œ ๋ชจ๋‘์— ClickHouse 18.16.1, glans ๋ฐ ZooKeeper๋ฅผ ์„ค์น˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo apt-key adv 
    --keyserver hkp://keyserver.ubuntu.com:80 
    --recv E0C56BD4
$ echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" | 
    sudo tee /etc/apt/sources.list.d/clickhouse.list
$ sudo apt-get update

$ sudo apt install 
    clickhouse-client 
    clickhouse-server 
    glances 
    zookeeperd

ClickHouse์šฉ ๋””๋ ‰ํ„ฐ๋ฆฌ๋ฅผ ๋งŒ๋“ค๊ณ  ์„ธ ์„œ๋ฒ„ ๋ชจ๋‘์—์„œ ์ผ๋ถ€ ๊ตฌ์„ฑ ์žฌ์ •์˜๋„ ์ˆ˜ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo mkdir /ch/clickhouse
$ sudo chown -R clickhouse /ch/clickhouse

$ sudo mkdir -p /etc/clickhouse-server/conf.d
$ sudo vi /etc/clickhouse-server/conf.d/taxis.conf

์ด๊ฒƒ์ด ์ œ๊ฐ€ ์‚ฌ์šฉํ•  ๊ตฌ์„ฑ ์žฌ์ •์˜์ž…๋‹ˆ๋‹ค.

<?xml version="1.0"?>
<yandex>
    <listen_host>0.0.0.0</listen_host>
    <path>/ch/clickhouse/</path>

 <remote_servers>
        <perftest_3shards>
            <shard>
                <replica>
                    <host>172.30.2.192</host>
                    <port>9000</port>
                 </replica>
            </shard>
            <shard>
                 <replica>
                    <host>172.30.2.162</host>
                    <port>9000</port>
                 </replica>
            </shard>
            <shard>
                 <replica>
                    <host>172.30.2.36</host>
                    <port>9000</port>
                 </replica>
            </shard>
        </perftest_3shards>
    </remote_servers>

  <zookeeper-servers>
        <node>
            <host>172.30.2.192</host>
            <port>2181</port>
        </node>
        <node>
            <host>172.30.2.162</host>
            <port>2181</port>
        </node>
        <node>
            <host>172.30.2.36</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>

 <macros>
        <shard>03</shard>
        <replica>01</replica>
    </macros>
</yandex>

๊ทธ๋Ÿฐ ๋‹ค์Œ ์„ธ ์‹œ์Šคํ…œ ๋ชจ๋‘์—์„œ ZooKeeper์™€ ClickHouse ์„œ๋ฒ„๋ฅผ ์‹คํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo /etc/init.d/zookeeper start
$ sudo service clickhouse-server start

ClickHouse์— ๋ฐ์ดํ„ฐ ์—…๋กœ๋“œ

์ฒซ ๋ฒˆ์งธ ์„œ๋ฒ„์—์„œ๋Š” ์—ฌํ–‰ ํ…Œ์ด๋ธ”(trips), ์ด๋Š” ๋กœ๊ทธ ์—”์ง„์„ ์‚ฌ์šฉํ•˜์—ฌ ํƒ์‹œ ์—ฌํ–‰ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

$ clickhouse-client --host=0.0.0.0
 
CREATE TABLE trips (
    trip_id                 UInt32,
    vendor_id               String,

    pickup_datetime         DateTime,
    dropoff_datetime        Nullable(DateTime),

    store_and_fwd_flag      Nullable(FixedString(1)),
    rate_code_id            Nullable(UInt8),
    pickup_longitude        Nullable(Float64),
    pickup_latitude         Nullable(Float64),
    dropoff_longitude       Nullable(Float64),
    dropoff_latitude        Nullable(Float64),
    passenger_count         Nullable(UInt8),
    trip_distance           Nullable(Float64),
    fare_amount             Nullable(Float32),
    extra                   Nullable(Float32),
    mta_tax                 Nullable(Float32),
    tip_amount              Nullable(Float32),
    tolls_amount            Nullable(Float32),
    ehail_fee               Nullable(Float32),
    improvement_surcharge   Nullable(Float32),
    total_amount            Nullable(Float32),
    payment_type            Nullable(String),
    trip_type               Nullable(UInt8),
    pickup                  Nullable(String),
    dropoff                 Nullable(String),

    cab_type                Nullable(String),

    precipitation           Nullable(Int8),
    snow_depth              Nullable(Int8),
    snowfall                Nullable(Int8),
    max_temperature         Nullable(Int8),
    min_temperature         Nullable(Int8),
    average_wind_speed      Nullable(Int8),

    pickup_nyct2010_gid     Nullable(Int8),
    pickup_ctlabel          Nullable(String),
    pickup_borocode         Nullable(Int8),
    pickup_boroname         Nullable(String),
    pickup_ct2010           Nullable(String),
    pickup_boroct2010       Nullable(String),
    pickup_cdeligibil       Nullable(FixedString(1)),
    pickup_ntacode          Nullable(String),
    pickup_ntaname          Nullable(String),
    pickup_puma             Nullable(String),

    dropoff_nyct2010_gid    Nullable(UInt8),
    dropoff_ctlabel         Nullable(String),
    dropoff_borocode        Nullable(UInt8),
    dropoff_boroname        Nullable(String),
    dropoff_ct2010          Nullable(String),
    dropoff_boroct2010      Nullable(String),
    dropoff_cdeligibil      Nullable(String),
    dropoff_ntacode         Nullable(String),
    dropoff_ntaname         Nullable(String),
    dropoff_puma            Nullable(String)
) ENGINE = Log;

๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ฐ CSV ํŒŒ์ผ์„ ์ถ”์ถœํ•˜์—ฌ ์—ฌํ–‰ ํ…Œ์ด๋ธ”(trips). ๋‹ค์Œ ์ž‘์—…์€ 55๋ถ„ 10์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์—… ํ›„ ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ„ฐ๋ฆฌ์˜ ํฌ๊ธฐ๋Š” 134GB์˜€์Šต๋‹ˆ๋‹ค.

$ time (for FILENAME in /ch/csv/trips_x*.csv.gz; do
            echo $FILENAME
            gunzip -c $FILENAME | 
                clickhouse-client 
                    --host=0.0.0.0 
                    --query="INSERT INTO trips FORMAT CSV"
        done)

๊ฐ€์ ธ์˜ค๊ธฐ ์†๋„๋Š” ์ดˆ๋‹น 155MB์˜ ๋น„์••์ถ• CSV ์ฝ˜ํ…์ธ ์˜€์Šต๋‹ˆ๋‹ค. ๋‚˜๋Š” ์ด๊ฒƒ์ด GZIP ์••์ถ• ํ•ด์ œ์˜ ๋ณ‘๋ชฉ ํ˜„์ƒ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. xargs๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋“  gzip ํŒŒ์ผ์˜ ์••์ถ•์„ ๋ณ‘๋ ฌ๋กœ ํ’€๊ณ  ์••์ถ•์ด ํ’€๋ฆฐ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•˜๋Š” ๊ฒƒ์ด ๋” ๋น ๋ฅผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ CSV ๊ฐ€์ ธ์˜ค๊ธฐ ๊ณผ์ •์—์„œ ๋ณด๊ณ ๋œ ๋‚ด์šฉ์— ๋Œ€ํ•œ ์„ค๋ช…์ž…๋‹ˆ๋‹ค.

$ sudo glances

ip-172-30-2-200 (Ubuntu 16.04 64bit / Linux 4.4.0-1072-aws)                                                                                                 Uptime: 0:11:42
CPU       8.2%  nice:     0.0%                           LOAD    36-core                           MEM      9.8%  active:    5.20G                           SWAP      0.0%
user:     6.0%  irq:      0.0%                           1 min:    2.24                            total:  68.7G  inactive:  61.0G                           total:       0
system:   0.9%  iowait:   1.3%                           5 min:    1.83                            used:   6.71G  buffers:   66.4M                           used:        0
idle:    91.8%  steal:    0.0%                           15 min:   1.01                            free:   62.0G  cached:    61.6G                           free:        0

NETWORK     Rx/s   Tx/s   TASKS 370 (507 thr), 2 run, 368 slp, 0 oth sorted automatically by cpu_percent, flat view
ens5        136b    2Kb
lo         343Mb  343Mb     CPU%  MEM%  VIRT   RES   PID USER        NI S    TIME+ IOR/s IOW/s Command
                           100.4   1.5 1.65G 1.06G  9909 ubuntu       0 S  1:01.33     0     0 clickhouse-client --host=0.0.0.0 --query=INSERT INTO trips FORMAT CSV
DISK I/O     R/s    W/s     85.1   0.0 4.65M  708K  9908 ubuntu       0 R  0:50.60   32M     0 gzip -d -c /ch/csv/trips_xac.csv.gz
loop0          0      0     54.9   5.1 8.14G 3.49G  8091 clickhous    0 S  1:44.23     0   45M /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
loop1          0      0      4.5   0.0     0     0   319 root         0 S  0:07.50    1K     0 kworker/u72:2
nvme0n1        0     3K      2.3   0.0 91.1M 28.9M  9912 root         0 R  0:01.56     0     0 /usr/bin/python3 /usr/bin/glances
nvme0n1p1      0     3K      0.3   0.0     0     0   960 root       -20 S  0:00.10     0     0 kworker/28:1H
nvme1n1    32.1M   495M      0.3   0.0     0     0  1058 root       -20 S  0:00.90     0     0 kworker/23:1H

๊ณ„์†ํ•˜๊ธฐ ์ „์— ์›๋ณธ CSV ํŒŒ์ผ์„ ์‚ญ์ œํ•˜์—ฌ NVMe ๋“œ๋ผ์ด๋ธŒ์˜ ๊ณต๊ฐ„์„ ํ™•๋ณดํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ sudo rm -fr /ch/csv

์—ด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜

Log ClickHouse ์—”์ง„์€ ํ–‰ ๊ธฐ๋ฐ˜ ํ˜•์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ์ฟผ๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด MergeTree ์—”์ง„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์—ด ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

$ clickhouse-client --host=0.0.0.0

๋‹ค์Œ ์ž‘์—…์€ 34๋ถ„ 50์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์—… ํ›„ ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ„ฐ๋ฆฌ์˜ ํฌ๊ธฐ๋Š” 237GB์˜€์Šต๋‹ˆ๋‹ค.

CREATE TABLE trips_mergetree
    ENGINE = MergeTree(pickup_date, pickup_datetime, 8192)
    AS SELECT
        trip_id,
        CAST(vendor_id AS Enum8('1' = 1,
                                '2' = 2,
                                'CMT' = 3,
                                'VTS' = 4,
                                'DDS' = 5,
                                'B02512' = 10,
                                'B02598' = 11,
                                'B02617' = 12,
                                'B02682' = 13,
                                'B02764' = 14)) AS vendor_id,
        toDate(pickup_datetime)                 AS pickup_date,
        ifNull(pickup_datetime, toDateTime(0))  AS pickup_datetime,
        toDate(dropoff_datetime)                AS dropoff_date,
        ifNull(dropoff_datetime, toDateTime(0)) AS dropoff_datetime,
        assumeNotNull(store_and_fwd_flag)       AS store_and_fwd_flag,
        assumeNotNull(rate_code_id)             AS rate_code_id,

        assumeNotNull(pickup_longitude)         AS pickup_longitude,
        assumeNotNull(pickup_latitude)          AS pickup_latitude,
        assumeNotNull(dropoff_longitude)        AS dropoff_longitude,
        assumeNotNull(dropoff_latitude)         AS dropoff_latitude,
        assumeNotNull(passenger_count)          AS passenger_count,
        assumeNotNull(trip_distance)            AS trip_distance,
        assumeNotNull(fare_amount)              AS fare_amount,
        assumeNotNull(extra)                    AS extra,
        assumeNotNull(mta_tax)                  AS mta_tax,
        assumeNotNull(tip_amount)               AS tip_amount,
        assumeNotNull(tolls_amount)             AS tolls_amount,
        assumeNotNull(ehail_fee)                AS ehail_fee,
        assumeNotNull(improvement_surcharge)    AS improvement_surcharge,
        assumeNotNull(total_amount)             AS total_amount,
        assumeNotNull(payment_type)             AS payment_type_,
        assumeNotNull(trip_type)                AS trip_type,

        pickup AS pickup,
        pickup AS dropoff,

        CAST(assumeNotNull(cab_type)
            AS Enum8('yellow' = 1, 'green' = 2))
                                AS cab_type,

        precipitation           AS precipitation,
        snow_depth              AS snow_depth,
        snowfall                AS snowfall,
        max_temperature         AS max_temperature,
        min_temperature         AS min_temperature,
        average_wind_speed      AS average_wind_speed,

        pickup_nyct2010_gid     AS pickup_nyct2010_gid,
        pickup_ctlabel          AS pickup_ctlabel,
        pickup_borocode         AS pickup_borocode,
        pickup_boroname         AS pickup_boroname,
        pickup_ct2010           AS pickup_ct2010,
        pickup_boroct2010       AS pickup_boroct2010,
        pickup_cdeligibil       AS pickup_cdeligibil,
        pickup_ntacode          AS pickup_ntacode,
        pickup_ntaname          AS pickup_ntaname,
        pickup_puma             AS pickup_puma,

        dropoff_nyct2010_gid    AS dropoff_nyct2010_gid,
        dropoff_ctlabel         AS dropoff_ctlabel,
        dropoff_borocode        AS dropoff_borocode,
        dropoff_boroname        AS dropoff_boroname,
        dropoff_ct2010          AS dropoff_ct2010,
        dropoff_boroct2010      AS dropoff_boroct2010,
        dropoff_cdeligibil      AS dropoff_cdeligibil,
        dropoff_ntacode         AS dropoff_ntacode,
        dropoff_ntaname         AS dropoff_ntaname,
        dropoff_puma            AS dropoff_puma
    FROM trips;

์ž‘์—… ์ค‘ Glance ์ถœ๋ ฅ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

ip-172-30-2-200 (Ubuntu 16.04 64bit / Linux 4.4.0-1072-aws)                                                                                                 Uptime: 1:06:09
CPU      10.3%  nice:     0.0%                           LOAD    36-core                           MEM     16.1%  active:    13.3G                           SWAP      0.0%
user:     7.9%  irq:      0.0%                           1 min:    1.87                            total:  68.7G  inactive:  52.8G                           total:       0
system:   1.6%  iowait:   0.8%                           5 min:    1.76                            used:   11.1G  buffers:   71.8M                           used:        0
idle:    89.7%  steal:    0.0%                           15 min:   1.95                            free:   57.6G  cached:    57.2G                           free:        0

NETWORK     Rx/s   Tx/s   TASKS 367 (523 thr), 1 run, 366 slp, 0 oth sorted automatically by cpu_percent, flat view
ens5         1Kb    8Kb
lo           2Kb    2Kb     CPU%  MEM%  VIRT   RES   PID USER        NI S    TIME+ IOR/s IOW/s Command
                           241.9  12.8 20.7G 8.78G  8091 clickhous    0 S 30:36.73   34M  125M /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
DISK I/O     R/s    W/s      2.6   0.0 90.4M 28.3M  9948 root         0 R  1:18.53     0     0 /usr/bin/python3 /usr/bin/glances
loop0          0      0      1.3   0.0     0     0   203 root         0 S  0:09.82     0     0 kswapd0
loop1          0      0      0.3   0.1  315M 61.3M 15701 ubuntu       0 S  0:00.40     0     0 clickhouse-client --host=0.0.0.0
nvme0n1        0     3K      0.3   0.0     0     0     7 root         0 S  0:00.83     0     0 rcu_sched
nvme0n1p1      0     3K      0.0   0.0     0     0   142 root         0 S  0:00.22     0     0 migration/27
nvme1n1    25.8M   330M      0.0   0.0 59.7M 1.79M  2764 ubuntu       0 S  0:00.00     0     0 (sd-pam)

๋งˆ์ง€๋ง‰ ํ…Œ์ŠคํŠธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ์—ด์ด ๋ณ€ํ™˜๋˜์–ด ๋‹ค์‹œ ๊ณ„์‚ฐ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ๋Šฅ ์ค‘ ์ผ๋ถ€๊ฐ€ ์ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋” ์ด์ƒ ์˜ˆ์ƒ๋Œ€๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ถ€์ ํ•ฉํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ฑฐํ•˜๊ณ  ๋” ์„ธ๋ถ„ํ™”๋œ ์œ ํ˜•์œผ๋กœ ๋ณ€ํ™˜ํ•˜์ง€ ์•Š๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ–ˆ์Šต๋‹ˆ๋‹ค.

ํด๋Ÿฌ์Šคํ„ฐ ์ „์ฒด์— ๋ฐ์ดํ„ฐ ๋ฐฐํฌ

์„ธ ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ ๋…ธ๋“œ ๋ชจ๋‘์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐํฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์‹œ์ž‘ํ•˜๋ ค๋ฉด ์•„๋ž˜์—์„œ ์„ธ ๋จธ์‹  ๋ชจ๋‘์— ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

$ clickhouse-client --host=0.0.0.0

CREATE TABLE trips_mergetree_third (
    trip_id                 UInt32,
    vendor_id               String,
    pickup_date             Date,
    pickup_datetime         DateTime,
    dropoff_date            Date,
    dropoff_datetime        Nullable(DateTime),
    store_and_fwd_flag      Nullable(FixedString(1)),
    rate_code_id            Nullable(UInt8),
    pickup_longitude        Nullable(Float64),
    pickup_latitude         Nullable(Float64),
    dropoff_longitude       Nullable(Float64),
    dropoff_latitude        Nullable(Float64),
    passenger_count         Nullable(UInt8),
    trip_distance           Nullable(Float64),
    fare_amount             Nullable(Float32),
    extra                   Nullable(Float32),
    mta_tax                 Nullable(Float32),
    tip_amount              Nullable(Float32),
    tolls_amount            Nullable(Float32),
    ehail_fee               Nullable(Float32),
    improvement_surcharge   Nullable(Float32),
    total_amount            Nullable(Float32),
    payment_type            Nullable(String),
    trip_type               Nullable(UInt8),
    pickup                  Nullable(String),
    dropoff                 Nullable(String),

    cab_type                Nullable(String),

    precipitation           Nullable(Int8),
    snow_depth              Nullable(Int8),
    snowfall                Nullable(Int8),
    max_temperature         Nullable(Int8),
    min_temperature         Nullable(Int8),
    average_wind_speed      Nullable(Int8),

    pickup_nyct2010_gid     Nullable(Int8),
    pickup_ctlabel          Nullable(String),
    pickup_borocode         Nullable(Int8),
    pickup_boroname         Nullable(String),
    pickup_ct2010           Nullable(String),
    pickup_boroct2010       Nullable(String),
    pickup_cdeligibil       Nullable(FixedString(1)),
    pickup_ntacode          Nullable(String),
    pickup_ntaname          Nullable(String),
    pickup_puma             Nullable(String),

    dropoff_nyct2010_gid    Nullable(UInt8),
    dropoff_ctlabel         Nullable(String),
    dropoff_borocode        Nullable(UInt8),
    dropoff_boroname        Nullable(String),
    dropoff_ct2010          Nullable(String),
    dropoff_boroct2010      Nullable(String),
    dropoff_cdeligibil      Nullable(String),
    dropoff_ntacode         Nullable(String),
    dropoff_ntaname         Nullable(String),
    dropoff_puma            Nullable(String)
) ENGINE = MergeTree(pickup_date, pickup_datetime, 8192);

๊ทธ๋Ÿฐ ๋‹ค์Œ ์ฒซ ๋ฒˆ์งธ ์„œ๋ฒ„๊ฐ€ ํด๋Ÿฌ์Šคํ„ฐ์˜ ์„ธ ๋…ธ๋“œ๋ฅผ ๋ชจ๋‘ ๋ณผ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

SELECT *
FROM system.clusters
WHERE cluster = 'perftest_3shards'
FORMAT Vertical;
Row 1:
โ”€โ”€โ”€โ”€โ”€โ”€
cluster:          perftest_3shards
shard_num:        1
shard_weight:     1
replica_num:      1
host_name:        172.30.2.192
host_address:     172.30.2.192
port:             9000
is_local:         1
user:             default
default_database:
Row 2:
โ”€โ”€โ”€โ”€โ”€โ”€
cluster:          perftest_3shards
shard_num:        2
shard_weight:     1
replica_num:      1
host_name:        172.30.2.162
host_address:     172.30.2.162
port:             9000
is_local:         0
user:             default
default_database:

Row 3:
โ”€โ”€โ”€โ”€โ”€โ”€
cluster:          perftest_3shards
shard_num:        3
shard_weight:     1
replica_num:      1
host_name:        172.30.2.36
host_address:     172.30.2.36
port:             9000
is_local:         0
user:             default
default_database:

๊ทธ๋Ÿฐ ๋‹ค์Œ ์Šคํ‚ค๋งˆ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ ์„œ๋ฒ„์— ์ƒˆ ํ…Œ์ด๋ธ”์„ ์ •์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. trips_mergetree_third ๊ทธ๋ฆฌ๊ณ  ๋ถ„์‚ฐ ์—”์ง„์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

CREATE TABLE trips_mergetree_x3
    AS trips_mergetree_third
    ENGINE = Distributed(perftest_3shards,
                         default,
                         trips_mergetree_third,
                         rand());

๊ทธ๋Ÿฐ ๋‹ค์Œ MergeTree ๊ธฐ๋ฐ˜ ํ…Œ์ด๋ธ”์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์„ธ ์„œ๋ฒ„ ๋ชจ๋‘์— ๋ณต์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์ž‘์—…์€ 34๋ถ„ 44์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

INSERT INTO trips_mergetree_x3
    SELECT * FROM trips_mergetree;

์œ„ ์ž‘์—… ํ›„ ClickHouse์— ์ตœ๋Œ€ ์ €์žฅ ๋ ˆ๋ฒจ ํ‘œ์‹œ์—์„œ ๋ฒ—์–ด๋‚˜๋„๋ก 15๋ถ„์˜ ์‹œ๊ฐ„์„ ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ„ฐ๋ฆฌ๋Š” ์„ธ ์„œ๋ฒ„ ๊ฐ๊ฐ์—์„œ ๊ฐ๊ฐ 264GB, 34GB, 33GB๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ClickHouse ํด๋Ÿฌ์Šคํ„ฐ ์„ฑ๋Šฅ ํ‰๊ฐ€

๋‹ค์Œ์— ๋ณธ ๊ฒƒ์€ ํ…Œ์ด๋ธ”์—์„œ ๊ฐ ์ฟผ๋ฆฌ๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•œ ๊ฒƒ ์ค‘ ๊ฐ€์žฅ ๋น ๋ฅธ ์‹œ๊ฐ„์ด์—ˆ์Šต๋‹ˆ๋‹ค. trips_mergetree_x3.

$ clickhouse-client --host=0.0.0.0

๋‹ค์Œ์€ 2.449์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT cab_type, count(*)
FROM trips_mergetree_x3
GROUP BY cab_type;

๋‹ค์Œ์€ 0.691์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT passenger_count,
       avg(total_amount)
FROM trips_mergetree_x3
GROUP BY passenger_count;

๋‹ค์Œ์€ 0์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT passenger_count,
       toYear(pickup_date) AS year,
       count(*)
FROM trips_mergetree_x3
GROUP BY passenger_count,
         year;

๋‹ค์Œ์€ 0.983์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT passenger_count,
       toYear(pickup_date) AS year,
       round(trip_distance) AS distance,
       count(*)
FROM trips_mergetree_x3
GROUP BY passenger_count,
         year,
         distance
ORDER BY year,
         count(*) DESC;

๋น„๊ต๋ฅผ ์œ„ํ•ด ์ฒซ ๋ฒˆ์งธ ์„œ๋ฒ„์—๋งŒ ์žˆ๋Š” MergeTree ๊ธฐ๋ฐ˜ ํ…Œ์ด๋ธ”์—์„œ ๋™์ผํ•œ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

ClickHouse ๋…ธ๋“œ ํ•˜๋‚˜์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€

๋‹ค์Œ์— ๋ณธ ๊ฒƒ์€ ํ…Œ์ด๋ธ”์—์„œ ๊ฐ ์ฟผ๋ฆฌ๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•œ ๊ฒƒ ์ค‘ ๊ฐ€์žฅ ๋น ๋ฅธ ์‹œ๊ฐ„์ด์—ˆ์Šต๋‹ˆ๋‹ค. trips_mergetree_x3.

๋‹ค์Œ์€ 0.241์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT cab_type, count(*)
FROM trips_mergetree
GROUP BY cab_type;

๋‹ค์Œ์€ 0.826์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT passenger_count,
       avg(total_amount)
FROM trips_mergetree
GROUP BY passenger_count;

๋‹ค์Œ์€ 1.209์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT passenger_count,
       toYear(pickup_date) AS year,
       count(*)
FROM trips_mergetree
GROUP BY passenger_count,
         year;

๋‹ค์Œ์€ 1.781์ดˆ ๋งŒ์— ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

SELECT passenger_count,
       toYear(pickup_date) AS year,
       round(trip_distance) AS distance,
       count(*)
FROM trips_mergetree
GROUP BY passenger_count,
         year,
         distance
ORDER BY year,
         count(*) DESC;

๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๊ณ ์ฐฐ

ํ…Œ์ŠคํŠธ์—์„œ ๋ฌด๋ฃŒ CPU ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ GPU ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚œ ๊ฒƒ์€ ์ด๋ฒˆ์ด ์ฒ˜์Œ์ž…๋‹ˆ๋‹ค. GPU ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ๊ทธ ์ดํ›„๋กœ ๋‘ ๋ฒˆ์˜ ๊ฐœ์ •์„ ๊ฑฐ์ณค์ง€๋งŒ ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ClickHouse๊ฐ€ ๋‹จ์ผ ๋…ธ๋“œ์—์„œ ์ œ๊ณตํ•œ ์„ฑ๋Šฅ์€ ๋งค์šฐ ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค.

๋™์‹œ์— ๋ถ„์‚ฐ ์—”์ง„์—์„œ ์ฟผ๋ฆฌ 1์„ ์‹คํ–‰ํ•  ๋•Œ ์˜ค๋ฒ„ํ—ค๋“œ ๋น„์šฉ์€ ํ›จ์”ฌ ๋” ๋†’์Šต๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ์— ๋” ๋งŽ์€ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์ฟผ๋ฆฌ ์‹œ๊ฐ„์ด ์ค„์–ด๋“ค๋ฉด ์ข‹์„ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ด ๊ฒŒ์‹œ๋ฌผ์— ๋Œ€ํ•œ ์กฐ์‚ฌ์—์„œ ๋ญ”๊ฐ€๋ฅผ ๋†“์ณค๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๋‹ค๋ฅธ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•  ๋•Œ ์„ฑ๋Šฅ์ด 2๋ฐฐ ์ •๋„ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค๋Š” ์ ์€ ์ •๋ง ๋Œ€๋‹จํ•ฉ๋‹ˆ๋‹ค.

ClickHouse๊ฐ€ ์Šคํ† ๋ฆฌ์ง€์™€ ์ปดํ“จํŒ…์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋…๋ฆฝ์ ์œผ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฐœ์ „ํ•˜๋Š” ๊ฒƒ์„ ๋ณด๋ฉด ์ข‹์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ž‘๋…„์— ์ถ”๊ฐ€๋œ HDFS ์ง€์›์€ ์ด๋ฅผ ํ–ฅํ•œ ํ•œ ๊ฑธ์Œ์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ปดํ“จํŒ… ์ธก๋ฉด์—์„œ ํด๋Ÿฌ์Šคํ„ฐ์— ๋” ๋งŽ์€ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋‹จ์ผ ์ฟผ๋ฆฌ๋ฅผ ๊ฐ€์†ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ์ด ์†Œํ”„ํŠธ์›จ์–ด์˜ ๋ฏธ๋ž˜๋Š” ๋งค์šฐ ๋ฐ์Šต๋‹ˆ๋‹ค.

์‹œ๊ฐ„์„ ๋‚ด์–ด ์ด ๊ฒŒ์‹œ๋ฌผ์„ ์ฝ์–ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ €๋Š” ๋ถ๋ฏธ์™€ ์œ ๋Ÿฝ์˜ ๊ณ ๊ฐ์—๊ฒŒ ์ปจ์„คํŒ…, ์•„ํ‚คํ…์ฒ˜ ๋ฐ ์‹ค๋ฌด ๊ฐœ๋ฐœ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋‚ด ์ œ์•ˆ์ด ๊ท€ํ•˜์˜ ๋น„์ฆˆ๋‹ˆ์Šค์— ์–ด๋–ป๊ฒŒ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋Š”์ง€ ๋…ผ์˜ํ•˜๊ณ  ์‹ถ์œผ์‹œ๋ฉด ๋‹ค์Œ ์ฃผ์†Œ๋กœ ์ €์—๊ฒŒ ์—ฐ๋ฝํ•ด ์ฃผ์„ธ์š”. ๋งํฌ๋“œ์ธ.

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€