An shirya fassarar labarin musamman ga ɗaliban kwas ɗin
Shekaru biyu da suka wuce na yi
ClickHouse ya ƙunshi layukan C++ dubu 170, ban da ɗakunan karatu na ɓangare na uku, kuma yana ɗaya daga cikin mafi ƙanƙanta da rarraba bayanai na codebases. Idan aka kwatanta, SQLite ba ya goyan bayan rarrabawa kuma ya ƙunshi layin 235 dubu na lambar C. Kamar yadda wannan rubuce-rubucen, 207 injiniyoyi sun ba da gudummawa ga ClickHouse, kuma ƙarfin aikatawa yana karuwa kwanan nan.
A cikin Maris 2017, ClickHouse ya fara gudanarwa
A cikin wannan labarin, zan duba aikin gungu na ClickHouse akan AWS EC2 ta amfani da na'urori masu sarrafawa na 36-core da ajiyar NVMe.
LABARI: Mako guda bayan buga wannan post ɗin, na sake gwada gwajin tare da ingantaccen tsari kuma na sami sakamako mafi kyau. An sabunta wannan sakon don nuna waɗannan canje-canje.
Ƙaddamar da AWS EC2 Cluster
Zan yi amfani da misalin c5d.9xlarge EC2 guda uku don wannan matsayi. Kowannen su ya ƙunshi CPUs masu kama-da-wane 36, 72 GB na RAM, 900 GB na ajiyar NVMe SSD kuma yana goyan bayan hanyar sadarwa ta Gigabit 10. Suna kashe $1,962 a kowace awa a cikin yankin eu-west-1 lokacin da ake buƙata. Zan yi amfani da Ubuntu Server 16.04 LTS azaman tsarin aiki.
An saita Tacewar zaɓi ta yadda kowace na'ura za ta iya sadarwa da juna ba tare da hani ba, kuma adireshin IPv4 na kawai SSH ya ba da izini a cikin gungu.
Tuƙi NVMe a cikin yanayin shirye-shiryen aiki
Don ClickHouse ya yi aiki, zan ƙirƙiri tsarin fayil a cikin tsarin EXT4 akan injin NVMe akan kowane sabar.
$ sudo mkfs -t ext4 /dev/nvme1n1
$ sudo mkdir /ch
$ sudo mount /dev/nvme1n1 /ch
Da zarar an daidaita komai, za ku iya ganin wurin dutsen da 783 GB na sarari da ke akwai akan kowane tsarin.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 87.9M 1 loop /snap/core/5742
loop1 7:1 0 16.5M 1 loop /snap/amazon-ssm-agent/784
nvme0n1 259:1 0 8G 0 disk
└─nvme0n1p1 259:2 0 8G 0 part /
nvme1n1 259:0 0 838.2G 0 disk /ch
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 35G 0 35G 0% /dev
tmpfs 6.9G 8.8M 6.9G 1% /run
/dev/nvme0n1p1 7.7G 967M 6.8G 13% /
tmpfs 35G 0 35G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 35G 0 35G 0% /sys/fs/cgroup
/dev/loop0 88M 88M 0 100% /snap/core/5742
/dev/loop1 17M 17M 0 100% /snap/amazon-ssm-agent/784
tmpfs 6.9G 0 6.9G 0% /run/user/1000
/dev/nvme1n1 825G 73M 783G 1% /ch
Saitin bayanan da zan yi amfani da shi a wannan gwajin juji ne na bayanan da na samar daga hawan tasi biliyan 1.1 da aka yi a birnin New York sama da shekaru shida. A kan blog
$ sudo apt update
$ sudo apt install awscli
$ aws configure
Zan saita iyakacin buƙatun abokin ciniki na lokaci guda zuwa 100 domin fayilolin saukewa da sauri fiye da saitunan tsoho.
$ aws configure set
default.s3.max_concurrent_requests
100
Zan zazzage bayanan abubuwan hawan tasi daga AWS S3 kuma in adana shi akan tuƙin NVMe akan sabar farko. Wannan saitin bayanan shine ~ 104GB a cikin tsarin CSV mai matse GZIP.
$ sudo mkdir -p /ch/csv
$ sudo chown -R ubuntu /ch/csv
$ aws s3 sync s3://<bucket>/csv /ch/csv
ClickHouse shigarwa
Zan shigar da rarrabawar OpenJDK don Java 8 kamar yadda ake buƙata don gudanar da Apache ZooKeeper, wanda ake buƙata don rarraba rarraba ClickHouse akan duk injina uku.
$ sudo apt update
$ sudo apt install
openjdk-8-jre
openjdk-8-jdk-headless
Sannan na saita canjin yanayi JAVA_HOME
.
$ sudo vi /etc/profile
export JAVA_HOME=/usr
$ source /etc/profile
Zan yi amfani da tsarin sarrafa fakitin Ubuntu don shigar da ClickHouse 18.16.1, kallo da ZooKeeper akan dukkan injuna guda uku.
$ sudo apt-key adv
--keyserver hkp://keyserver.ubuntu.com:80
--recv E0C56BD4
$ echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" |
sudo tee /etc/apt/sources.list.d/clickhouse.list
$ sudo apt-get update
$ sudo apt install
clickhouse-client
clickhouse-server
glances
zookeeperd
Zan ƙirƙiri kundin adireshi don ClickHouse kuma in yi wasu gyare-gyare a kan duk sabobin uku.
$ sudo mkdir /ch/clickhouse
$ sudo chown -R clickhouse /ch/clickhouse
$ sudo mkdir -p /etc/clickhouse-server/conf.d
$ sudo vi /etc/clickhouse-server/conf.d/taxis.conf
Waɗannan su ne ƙa'idodin daidaitawa waɗanda zan yi amfani da su.
<?xml version="1.0"?>
<yandex>
<listen_host>0.0.0.0</listen_host>
<path>/ch/clickhouse/</path>
<remote_servers>
<perftest_3shards>
<shard>
<replica>
<host>172.30.2.192</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>172.30.2.162</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>172.30.2.36</host>
<port>9000</port>
</replica>
</shard>
</perftest_3shards>
</remote_servers>
<zookeeper-servers>
<node>
<host>172.30.2.192</host>
<port>2181</port>
</node>
<node>
<host>172.30.2.162</host>
<port>2181</port>
</node>
<node>
<host>172.30.2.36</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<shard>03</shard>
<replica>01</replica>
</macros>
</yandex>
Daga nan zan gudanar da ZooKeeper da uwar garken ClickHouse akan dukkan injina guda uku.
$ sudo /etc/init.d/zookeeper start
$ sudo service clickhouse-server start
Ana loda bayanai zuwa ClickHouse
A kan uwar garken farko zan ƙirƙiri teburin tafiya (trips
), wanda zai adana bayanan tafiye-tafiyen tasi ta amfani da injin Log.
$ clickhouse-client --host=0.0.0.0
CREATE TABLE trips (
trip_id UInt32,
vendor_id String,
pickup_datetime DateTime,
dropoff_datetime Nullable(DateTime),
store_and_fwd_flag Nullable(FixedString(1)),
rate_code_id Nullable(UInt8),
pickup_longitude Nullable(Float64),
pickup_latitude Nullable(Float64),
dropoff_longitude Nullable(Float64),
dropoff_latitude Nullable(Float64),
passenger_count Nullable(UInt8),
trip_distance Nullable(Float64),
fare_amount Nullable(Float32),
extra Nullable(Float32),
mta_tax Nullable(Float32),
tip_amount Nullable(Float32),
tolls_amount Nullable(Float32),
ehail_fee Nullable(Float32),
improvement_surcharge Nullable(Float32),
total_amount Nullable(Float32),
payment_type Nullable(String),
trip_type Nullable(UInt8),
pickup Nullable(String),
dropoff Nullable(String),
cab_type Nullable(String),
precipitation Nullable(Int8),
snow_depth Nullable(Int8),
snowfall Nullable(Int8),
max_temperature Nullable(Int8),
min_temperature Nullable(Int8),
average_wind_speed Nullable(Int8),
pickup_nyct2010_gid Nullable(Int8),
pickup_ctlabel Nullable(String),
pickup_borocode Nullable(Int8),
pickup_boroname Nullable(String),
pickup_ct2010 Nullable(String),
pickup_boroct2010 Nullable(String),
pickup_cdeligibil Nullable(FixedString(1)),
pickup_ntacode Nullable(String),
pickup_ntaname Nullable(String),
pickup_puma Nullable(String),
dropoff_nyct2010_gid Nullable(UInt8),
dropoff_ctlabel Nullable(String),
dropoff_borocode Nullable(UInt8),
dropoff_boroname Nullable(String),
dropoff_ct2010 Nullable(String),
dropoff_boroct2010 Nullable(String),
dropoff_cdeligibil Nullable(String),
dropoff_ntacode Nullable(String),
dropoff_ntaname Nullable(String),
dropoff_puma Nullable(String)
) ENGINE = Log;
Ina cirewa da loda kowane fayilolin CSV a cikin teburin tafiya (trips
). An kammala waɗannan abubuwan a cikin mintuna 55 da sakan 10. Bayan wannan aiki, girman kundin bayanan ya kasance 134 GB.
$ time (for FILENAME in /ch/csv/trips_x*.csv.gz; do
echo $FILENAME
gunzip -c $FILENAME |
clickhouse-client
--host=0.0.0.0
--query="INSERT INTO trips FORMAT CSV"
done)
Gudun shigo da shi ya kasance 155 MB na abun ciki na CSV maras nauyi a sakan daya. Ina tsammanin wannan ya faru ne saboda ƙulli a cikin lalatawar GZIP. Zai yi sauri don buɗe duk fayilolin gzipped a layi daya ta amfani da xargs sannan a loda bayanan da ba a buɗe ba. A ƙasa akwai bayanin abin da aka ruwaito yayin aiwatar da shigo da CSV.
$ sudo glances
ip-172-30-2-200 (Ubuntu 16.04 64bit / Linux 4.4.0-1072-aws) Uptime: 0:11:42
CPU 8.2% nice: 0.0% LOAD 36-core MEM 9.8% active: 5.20G SWAP 0.0%
user: 6.0% irq: 0.0% 1 min: 2.24 total: 68.7G inactive: 61.0G total: 0
system: 0.9% iowait: 1.3% 5 min: 1.83 used: 6.71G buffers: 66.4M used: 0
idle: 91.8% steal: 0.0% 15 min: 1.01 free: 62.0G cached: 61.6G free: 0
NETWORK Rx/s Tx/s TASKS 370 (507 thr), 2 run, 368 slp, 0 oth sorted automatically by cpu_percent, flat view
ens5 136b 2Kb
lo 343Mb 343Mb CPU% MEM% VIRT RES PID USER NI S TIME+ IOR/s IOW/s Command
100.4 1.5 1.65G 1.06G 9909 ubuntu 0 S 1:01.33 0 0 clickhouse-client --host=0.0.0.0 --query=INSERT INTO trips FORMAT CSV
DISK I/O R/s W/s 85.1 0.0 4.65M 708K 9908 ubuntu 0 R 0:50.60 32M 0 gzip -d -c /ch/csv/trips_xac.csv.gz
loop0 0 0 54.9 5.1 8.14G 3.49G 8091 clickhous 0 S 1:44.23 0 45M /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
loop1 0 0 4.5 0.0 0 0 319 root 0 S 0:07.50 1K 0 kworker/u72:2
nvme0n1 0 3K 2.3 0.0 91.1M 28.9M 9912 root 0 R 0:01.56 0 0 /usr/bin/python3 /usr/bin/glances
nvme0n1p1 0 3K 0.3 0.0 0 0 960 root -20 S 0:00.10 0 0 kworker/28:1H
nvme1n1 32.1M 495M 0.3 0.0 0 0 1058 root -20 S 0:00.90 0 0 kworker/23:1H
Zan ba da sarari akan tuƙi na NVMe ta hanyar share ainihin fayilolin CSV kafin ci gaba.
$ sudo rm -fr /ch/csv
Juya zuwa Samfuran Rukunin
Injin Log ClickHouse zai adana bayanai a cikin tsarin da ya dace da jere. Don neman bayanai cikin sauri, na canza shi zuwa tsarin shafi ta amfani da injin MergeTree.
$ clickhouse-client --host=0.0.0.0
An kammala waɗannan abubuwan a cikin mintuna 34 da daƙiƙa 50. Bayan wannan aiki, girman kundin bayanan shine 237 GB.
CREATE TABLE trips_mergetree
ENGINE = MergeTree(pickup_date, pickup_datetime, 8192)
AS SELECT
trip_id,
CAST(vendor_id AS Enum8('1' = 1,
'2' = 2,
'CMT' = 3,
'VTS' = 4,
'DDS' = 5,
'B02512' = 10,
'B02598' = 11,
'B02617' = 12,
'B02682' = 13,
'B02764' = 14)) AS vendor_id,
toDate(pickup_datetime) AS pickup_date,
ifNull(pickup_datetime, toDateTime(0)) AS pickup_datetime,
toDate(dropoff_datetime) AS dropoff_date,
ifNull(dropoff_datetime, toDateTime(0)) AS dropoff_datetime,
assumeNotNull(store_and_fwd_flag) AS store_and_fwd_flag,
assumeNotNull(rate_code_id) AS rate_code_id,
assumeNotNull(pickup_longitude) AS pickup_longitude,
assumeNotNull(pickup_latitude) AS pickup_latitude,
assumeNotNull(dropoff_longitude) AS dropoff_longitude,
assumeNotNull(dropoff_latitude) AS dropoff_latitude,
assumeNotNull(passenger_count) AS passenger_count,
assumeNotNull(trip_distance) AS trip_distance,
assumeNotNull(fare_amount) AS fare_amount,
assumeNotNull(extra) AS extra,
assumeNotNull(mta_tax) AS mta_tax,
assumeNotNull(tip_amount) AS tip_amount,
assumeNotNull(tolls_amount) AS tolls_amount,
assumeNotNull(ehail_fee) AS ehail_fee,
assumeNotNull(improvement_surcharge) AS improvement_surcharge,
assumeNotNull(total_amount) AS total_amount,
assumeNotNull(payment_type) AS payment_type_,
assumeNotNull(trip_type) AS trip_type,
pickup AS pickup,
pickup AS dropoff,
CAST(assumeNotNull(cab_type)
AS Enum8('yellow' = 1, 'green' = 2))
AS cab_type,
precipitation AS precipitation,
snow_depth AS snow_depth,
snowfall AS snowfall,
max_temperature AS max_temperature,
min_temperature AS min_temperature,
average_wind_speed AS average_wind_speed,
pickup_nyct2010_gid AS pickup_nyct2010_gid,
pickup_ctlabel AS pickup_ctlabel,
pickup_borocode AS pickup_borocode,
pickup_boroname AS pickup_boroname,
pickup_ct2010 AS pickup_ct2010,
pickup_boroct2010 AS pickup_boroct2010,
pickup_cdeligibil AS pickup_cdeligibil,
pickup_ntacode AS pickup_ntacode,
pickup_ntaname AS pickup_ntaname,
pickup_puma AS pickup_puma,
dropoff_nyct2010_gid AS dropoff_nyct2010_gid,
dropoff_ctlabel AS dropoff_ctlabel,
dropoff_borocode AS dropoff_borocode,
dropoff_boroname AS dropoff_boroname,
dropoff_ct2010 AS dropoff_ct2010,
dropoff_boroct2010 AS dropoff_boroct2010,
dropoff_cdeligibil AS dropoff_cdeligibil,
dropoff_ntacode AS dropoff_ntacode,
dropoff_ntaname AS dropoff_ntaname,
dropoff_puma AS dropoff_puma
FROM trips;
Wannan shine yadda fitowar kallo tayi kama yayin aikin:
ip-172-30-2-200 (Ubuntu 16.04 64bit / Linux 4.4.0-1072-aws) Uptime: 1:06:09
CPU 10.3% nice: 0.0% LOAD 36-core MEM 16.1% active: 13.3G SWAP 0.0%
user: 7.9% irq: 0.0% 1 min: 1.87 total: 68.7G inactive: 52.8G total: 0
system: 1.6% iowait: 0.8% 5 min: 1.76 used: 11.1G buffers: 71.8M used: 0
idle: 89.7% steal: 0.0% 15 min: 1.95 free: 57.6G cached: 57.2G free: 0
NETWORK Rx/s Tx/s TASKS 367 (523 thr), 1 run, 366 slp, 0 oth sorted automatically by cpu_percent, flat view
ens5 1Kb 8Kb
lo 2Kb 2Kb CPU% MEM% VIRT RES PID USER NI S TIME+ IOR/s IOW/s Command
241.9 12.8 20.7G 8.78G 8091 clickhous 0 S 30:36.73 34M 125M /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
DISK I/O R/s W/s 2.6 0.0 90.4M 28.3M 9948 root 0 R 1:18.53 0 0 /usr/bin/python3 /usr/bin/glances
loop0 0 0 1.3 0.0 0 0 203 root 0 S 0:09.82 0 0 kswapd0
loop1 0 0 0.3 0.1 315M 61.3M 15701 ubuntu 0 S 0:00.40 0 0 clickhouse-client --host=0.0.0.0
nvme0n1 0 3K 0.3 0.0 0 0 7 root 0 S 0:00.83 0 0 rcu_sched
nvme0n1p1 0 3K 0.0 0.0 0 0 142 root 0 S 0:00.22 0 0 migration/27
nvme1n1 25.8M 330M 0.0 0.0 59.7M 1.79M 2764 ubuntu 0 S 0:00.00 0 0 (sd-pam)
A gwaji na ƙarshe, an canza ginshiƙai da yawa kuma an sake ƙididdige su. Na gano cewa wasu daga cikin waɗannan ayyukan ba sa aiki kamar yadda ake tsammani akan wannan saitin bayanai. Don magance wannan matsalar, na cire ayyukan da ba su dace ba kuma na loda bayanan ba tare da juyo zuwa nau'ikan granular ba.
Rarraba bayanai a cikin gungu
Zan rarraba bayanan a cikin dukkan nodes ɗin tari guda uku. Don farawa, a ƙasa zan ƙirƙiri tebur akan duk injunan guda uku.
$ clickhouse-client --host=0.0.0.0
CREATE TABLE trips_mergetree_third (
trip_id UInt32,
vendor_id String,
pickup_date Date,
pickup_datetime DateTime,
dropoff_date Date,
dropoff_datetime Nullable(DateTime),
store_and_fwd_flag Nullable(FixedString(1)),
rate_code_id Nullable(UInt8),
pickup_longitude Nullable(Float64),
pickup_latitude Nullable(Float64),
dropoff_longitude Nullable(Float64),
dropoff_latitude Nullable(Float64),
passenger_count Nullable(UInt8),
trip_distance Nullable(Float64),
fare_amount Nullable(Float32),
extra Nullable(Float32),
mta_tax Nullable(Float32),
tip_amount Nullable(Float32),
tolls_amount Nullable(Float32),
ehail_fee Nullable(Float32),
improvement_surcharge Nullable(Float32),
total_amount Nullable(Float32),
payment_type Nullable(String),
trip_type Nullable(UInt8),
pickup Nullable(String),
dropoff Nullable(String),
cab_type Nullable(String),
precipitation Nullable(Int8),
snow_depth Nullable(Int8),
snowfall Nullable(Int8),
max_temperature Nullable(Int8),
min_temperature Nullable(Int8),
average_wind_speed Nullable(Int8),
pickup_nyct2010_gid Nullable(Int8),
pickup_ctlabel Nullable(String),
pickup_borocode Nullable(Int8),
pickup_boroname Nullable(String),
pickup_ct2010 Nullable(String),
pickup_boroct2010 Nullable(String),
pickup_cdeligibil Nullable(FixedString(1)),
pickup_ntacode Nullable(String),
pickup_ntaname Nullable(String),
pickup_puma Nullable(String),
dropoff_nyct2010_gid Nullable(UInt8),
dropoff_ctlabel Nullable(String),
dropoff_borocode Nullable(UInt8),
dropoff_boroname Nullable(String),
dropoff_ct2010 Nullable(String),
dropoff_boroct2010 Nullable(String),
dropoff_cdeligibil Nullable(String),
dropoff_ntacode Nullable(String),
dropoff_ntaname Nullable(String),
dropoff_puma Nullable(String)
) ENGINE = MergeTree(pickup_date, pickup_datetime, 8192);
Sannan zan tabbatar da cewa uwar garken farko na iya ganin duk nodes guda uku a cikin cluster.
SELECT *
FROM system.clusters
WHERE cluster = 'perftest_3shards'
FORMAT Vertical;
Row 1:
──────
cluster: perftest_3shards
shard_num: 1
shard_weight: 1
replica_num: 1
host_name: 172.30.2.192
host_address: 172.30.2.192
port: 9000
is_local: 1
user: default
default_database:
Row 2:
──────
cluster: perftest_3shards
shard_num: 2
shard_weight: 1
replica_num: 1
host_name: 172.30.2.162
host_address: 172.30.2.162
port: 9000
is_local: 0
user: default
default_database:
Row 3:
──────
cluster: perftest_3shards
shard_num: 3
shard_weight: 1
replica_num: 1
host_name: 172.30.2.36
host_address: 172.30.2.36
port: 9000
is_local: 0
user: default
default_database:
Sannan zan ayyana sabon tebur akan sabar farko wacce ta dogara akan tsarin trips_mergetree_third
kuma yana amfani da injin Rarraba.
CREATE TABLE trips_mergetree_x3
AS trips_mergetree_third
ENGINE = Distributed(perftest_3shards,
default,
trips_mergetree_third,
rand());
Zan kwafi bayanan daga teburin MergeTree zuwa duk sabobin uku. An kammala waɗannan abubuwan a cikin mintuna 34 da sakan 44.
INSERT INTO trips_mergetree_x3
SELECT * FROM trips_mergetree;
Bayan aikin da ke sama, na ba ClickHouse mintuna 15 don matsawa daga matsakaicin alamar matakin ajiya. Kundayen adireshi sun ƙare kasancewa 264 GB, 34 GB da 33 GB bi da bi akan kowane sabar ukun.
ClickHouse cluster kimanta aikin
Abin da na gani na gaba shi ne lokaci mafi sauri da na gani yana gudana kowace tambaya akan tebur sau da yawa trips_mergetree_x3
.
$ clickhouse-client --host=0.0.0.0
An kammala waɗannan abubuwan a cikin daƙiƙa 2.449.
SELECT cab_type, count(*)
FROM trips_mergetree_x3
GROUP BY cab_type;
An kammala waɗannan abubuwan a cikin daƙiƙa 0.691.
SELECT passenger_count,
avg(total_amount)
FROM trips_mergetree_x3
GROUP BY passenger_count;
An kammala waɗannan abubuwan a cikin daƙiƙa 0.
SELECT passenger_count,
toYear(pickup_date) AS year,
count(*)
FROM trips_mergetree_x3
GROUP BY passenger_count,
year;
An kammala waɗannan abubuwan a cikin daƙiƙa 0.983.
SELECT passenger_count,
toYear(pickup_date) AS year,
round(trip_distance) AS distance,
count(*)
FROM trips_mergetree_x3
GROUP BY passenger_count,
year,
distance
ORDER BY year,
count(*) DESC;
Don kwatantawa, na gudanar da tambayoyin iri ɗaya akan tebur na tushen MergeTree wanda ke zaune kawai akan sabar farko.
Ƙimar aiki na kumburin ClickHouse ɗaya
Abin da na gani na gaba shi ne lokaci mafi sauri da na gani yana gudana kowace tambaya akan tebur sau da yawa trips_mergetree_x3
.
An kammala waɗannan abubuwan a cikin daƙiƙa 0.241.
SELECT cab_type, count(*)
FROM trips_mergetree
GROUP BY cab_type;
An kammala waɗannan abubuwan a cikin daƙiƙa 0.826.
SELECT passenger_count,
avg(total_amount)
FROM trips_mergetree
GROUP BY passenger_count;
An kammala waɗannan abubuwan a cikin daƙiƙa 1.209.
SELECT passenger_count,
toYear(pickup_date) AS year,
count(*)
FROM trips_mergetree
GROUP BY passenger_count,
year;
An kammala waɗannan abubuwan a cikin daƙiƙa 1.781.
SELECT passenger_count,
toYear(pickup_date) AS year,
round(trip_distance) AS distance,
count(*)
FROM trips_mergetree
GROUP BY passenger_count,
year,
distance
ORDER BY year,
count(*) DESC;
Tunani akan sakamakon
Wannan shine karo na farko da tushen bayanai na CPU kyauta ya sami damar fin karfin bayanan tushen GPU a cikin gwaje-gwaje na. Wannan tushen tushen GPU ya wuce ta bita guda biyu tun daga lokacin, amma aikin da ClickHouse ya bayar akan kulli ɗaya duk da haka yana da ban sha'awa sosai.
A lokaci guda, lokacin aiwatar da Tambayoyi 1 akan injin da aka rarraba, ƙimar da ake kashewa shine tsari na girma mafi girma. Ina fata na rasa wani abu a cikin bincikena na wannan post ɗin saboda zai yi kyau in ga lokutan tambaya sun ragu yayin da na ƙara ƙarin nodes zuwa gungu. Koyaya, yana da kyau cewa lokacin aiwatar da wasu tambayoyin, aikin ya karu da kusan sau 2.
Zai yi kyau a ga ClickHouse ya samo asali don samun damar raba ajiya da lissafta ta yadda za su iya yin girman kansu. Tallafin HDFS, wanda aka ƙara a bara, na iya zama mataki zuwa wannan. Dangane da na'ura mai kwakwalwa, idan ana iya hanzarta tambaya guda ta hanyar ƙara ƙarin nodes zuwa gungu, to makomar wannan software tana da haske sosai.
Na gode da ba da lokaci don karanta wannan sakon. Ina ba da shawarwari, gine-gine, da ayyukan ci gaba ga abokan ciniki a Arewacin Amurka da Turai. Idan kuna son tattauna yadda shawarwarina zasu iya taimakawa kasuwancin ku, da fatan za a tuntuɓe ni ta hanyar
source: www.habr.com