Ushbu tadqiqotda men PostgreSQL emas, balki ClickHouse ma'lumotlar manbasidan foydalanish orqali qanday ishlash yaxshilanishiga erishish mumkinligini ko'rmoqchi edim. Men ClickHouse-dan foydalanishdan olingan samaradorlikni bilaman. Chet el ma'lumotlarini o'rash vositasi (FDW) yordamida PostgreSQL-dan ClickHouse-ga kirsam, bu imtiyozlar davom etadimi?
O'rganilgan ma'lumotlar bazasi muhitlari PostgreSQL v11, clickhousedb_fdw va ClickHouse ma'lumotlar bazasi. Oxir-oqibat, PostgreSQL v11 dan biz clickhousedb_fdw orqali ClickHouse ma'lumotlar bazasiga yo'naltirilgan turli xil SQL so'rovlarini bajaramiz. Keyin biz FDW ning ishlashi mahalliy PostgreSQL va mahalliy ClickHouse-da ishlaydigan bir xil so'rovlar bilan qanday solishtirilishini ko'rib chiqamiz.
Clickhouse ma'lumotlar bazasi
ClickHouse ochiq manbali ustunli ma'lumotlar bazasini boshqarish tizimi bo'lib, unumdorlikka an'anaviy ma'lumotlar bazasi yondashuvlariga qaraganda 100-1000 marta tezroq erisha oladi va bir soniyadan kamroq vaqt ichida milliarddan ortiq qatorlarni qayta ishlashga qodir.
Clickhousedb_fdw
clickhousedb_fdw - ClickHouse ma'lumotlar bazasi yoki FDW uchun tashqi ma'lumotlar o'rami Percona'dan ochiq manbali loyihadir.
Ko'rib turganingizdek, bu PostgreSQL v11 serveridan ClickHouse ma'lumotlar bazasidan SELECT va INSERT INTO imkonini beruvchi ClickHouse uchun FDW beradi.
FDW birlashtirish va qo'shilish kabi ilg'or xususiyatlarni qo'llab-quvvatlaydi. Bu resurs talab qiladigan operatsiyalar uchun masofaviy server resurslaridan foydalanish orqali unumdorlikni sezilarli darajada yaxshilaydi.
Benchmark muhiti
- Supermicro server:
- Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
- 2 ta rozetka / 28 yadro / 56 ta ip
- Xotira: 256 GB operativ xotira
- Saqlash: Samsung SM863 1.9TB Enterprise SSD
- Fayl tizimi: ext4/xfs
- OT: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
- PostgreSQL: 11-versiya
Benchmark testlari
Ushbu test uchun ba'zi bir mashina tomonidan yaratilgan ma'lumotlar to'plamidan foydalanish o'rniga, biz 1987 yildan 2018 yilgacha "Operator vaqti bo'yicha hisobot qilingan mahsuldorlik" ma'lumotlaridan foydalandik. Siz ma'lumotlarga kirishingiz mumkin
Ma'lumotlar bazasi hajmi 85 GB bo'lib, 109 ustundan iborat bitta jadvalni ta'minlaydi.
Benchmark so'rovlari
Mana men ClickHouse, clickhousedb_fdw va PostgreSQLni solishtirish uchun foydalangan so'rovlar.
Q#
So'rovda agregatlar va guruhlash bo'yicha
Q1
DayOfWeek, hisoblash(*) AS c AS o'z vaqtida Yil >= 2000 VA Yil <= 2008 Haftalik kun BO'YICHA GURUHGA TARTIB C TARTIBI BO'YICHA;
Q2
DayOfWeek TANLASH, (*) AS C AS QAYERDA DepDelay>10 VA Yil >= 2000 VA Yil <= 2008 Haftalik DayOf BY TARTIBI BO'YICHA GURUHLASH;
Q3
Kelib chiqishini tanlang, hisoblang(*) AS c FROM o'z vaqtida QAYERDA DepDelay>10 VA Yil >= 2000 VA YIL <= 2008 Kelib chiqishi bo'yicha GRUP TARTIB BY c TASHLAM LIMIT 10;
Q4
Tashuvchini tanlang, hisoblash() FROM o'z vaqtida QAYER DepDelay>10 VA Yil = 2007 Guruh tashuvchi bo'yicha TARTIB BY RO'YXAT() DESC;
Q5
SELECT a.Tashuvchi, c, c2, c1000/c2 sifatida c3 FROM (tashuvchini tanlang, hisoblash() AS c o'z vaqtida QAYERDA DepDelay>10 VA Yil=2007 GROUP BY Carrier ) a INNER JOIN ( SELECT Carrier,count(*) AS c2 FROM FROM on time WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC;
Q6
SELECT a.Tashuvchi, c, c2, c1000/c2 sifatida c3 FROM (tashuvchini tanlang, hisoblash() AS c o'z vaqtida QAYERDA DepDelay>10 VA Yil >= 2000 VA YIL <= 2008 GROUP BY Carrier) a INTER JOIN ( Operatorni tanlang, (*) AS C2 O'Z vaqtidan QAYER YIL >= 2000 VA YIL <= 2008 GRO UP Carrier ) b bo'yicha a.Carrier=b.Carrier ORDER BY c3 DESC;
Q7
Tashuvchini tanlang, avg(DepDelay) * 1000 AS c3 o'z vaqtida QAYERDA Yil >= 2000 VA YIL <= 2008 Tashuvchi bo'yicha GURUH;
Q8
Yilni SELECT, oʻrtacha(DepDelay) Oʻz vaqtida GROUP BY YILDAN;
Q9
Yilni tanlang, Yil bo'yicha vaqt guruhidan c1 sifatida (*) hisoblang;
Q10
SELECT avg(cnt) FROM (Yil,Oy,count(*) AS cnt FROM o'z vaqtida TANLASH QERDA DepDel15=1 GROUP YIL,OY) a;
Q11
dan avg(c1) ni tanlang (Yil,Oy,son(*) ni c1 sifatida Yil,Oy bo‘yicha ish vaqti guruhidan tanlang) a;
Q12
OriginCityName, DestCityName, count(*) AS C FROM OriginCityName, DestCityName TARTIBI BO‘YICHA TARTIBI 10;
Q13
OriginCityName ni tanlang, hisoblang (*) c AS C FROM OriginCityName BO'YICHA GROUP BY C DESC LIMIT 10;
So'rov qo'shilishlarni o'z ichiga oladi
Q14
a.Yil, c1/c2 FROM ni tanlang (Yil, hisobni tanlang()1000 dan c1 sifatida QAYERDA DepDelay>10 GROUP BY YIL) a INNER JOIN (Yilni tanlang, hisoblash(*) c2 sifatida o'z vaqtida GROUP BY YIL ) b on a.Year=b.Year ORD BY a.Year;
Q15
a.”Yil”, c1/c2 FROM ( “Yil”ni tanlang, hisoblash()1000 as c1 fontime FROM WHERE “DepDelay”>10 GROUP BY “Year”) a INNER JOIN (“Yil” ni tanlang, c2 sifatida hisoblash(*) FROM fontime GROUP BY “Year” ) b on a.”Year”=b. "Yil";
Jadval-1: Benchmarkda foydalanilgan so'rovlar
So'rovlarni bajarish
Turli ma'lumotlar bazasi sozlamalarida ishga tushirilganda so'rovlarning har birining natijalari quyida keltirilgan: indeksli va indekssiz PostgreSQL, mahalliy ClickHouse va clickhousedb_fdw. Vaqt millisekundlarda ko'rsatilgan.
Q#
PostgreSQL
PostgreSQL (indekslangan)
ClickHouse
clickhousedb_fdw
Q1
27920
19634
23
57
Q2
35124
17301
50
80
Q3
34046
15618
67
115
Q4
31632
7667
25
37
Q5
47220
8976
27
60
Q6
58233
24368
55
153
Q7
30566
13256
52
91
Q8
38309
60511
112
179
Q9
20674
37979
31
81
Q10
34990
20102
56
148
Q11
30489
51658
37
155
Q12
39357
33742
186
1333
Q13
29912
30709
101
384
Q14
54126
39913
124
1364212
Q15
97258
30211
245
259
Jadval-1: Benchmarkda foydalanilgan so'rovlarni bajarish uchun sarflangan vaqt
Natijalarni ko'rish
Grafik so'rovni bajarish vaqtini millisekundlarda, X o'qi yuqoridagi jadvallardagi so'rovlar raqamini, Y o'qi esa millisekundlarda bajarilish vaqtini ko'rsatadi. ClickHouse natijalari va postgresdan clickhousedb_fdw yordamida olingan ma'lumotlar ko'rsatilgan. Jadvaldan PostgreSQL va ClickHouse o'rtasida katta farq borligini ko'rishingiz mumkin, ammo ClickHouse va clickhousedb_fdw o'rtasida minimal farq bor.
Ushbu grafik ClickhouseDB va clickhousedb_fdw o'rtasidagi farqni ko'rsatadi. Ko'pgina so'rovlarda FDW qo'shimcha xarajatlari unchalik yuqori emas va 12-chorakdan tashqari deyarli ahamiyatli emas. Bu soʻrov birlashmalarni va ORDER BY bandini oʻz ichiga oladi. ORDER BY GROUP/BY bandi tufayli ORDER BY tugmasi ClickHouse-ga tushmaydi.
2-jadvalda biz Q12 va Q13 so'rovlarida vaqt sakrashini ko'ramiz. Shunga qaramay, bu ORDER BY bandidan kelib chiqadi. Buni tasdiqlash uchun men Q-14 va Q-15 so'rovlarini ORDER BY bandi bilan va holda bajardim. ORDER BY bandisiz bajarish vaqti 259ms, ORDER BY bandida esa 1364212 ga teng.
15-savol: ORDER BY bandisiz
bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2
FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";
15-savol: ORDER BY bandisiz so'rov
QUERY PLAN
Hash Join (cost=2250.00..128516.06 rows=50000000 width=12)
Output: fontime."Year", (((count(*) * 1000)) / b.c2)
Inner Unique: true Hash Cond: (fontime."Year" = b."Year")
-> Foreign Scan (cost=1.00..-1.00 rows=100000 width=12)
Output: fontime."Year", ((count(*) * 1000))
Relations: Aggregate on (fontime)
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"
-> Hash (cost=999.00..999.00 rows=100000 width=12)
Output: b.c2, b."Year"
-> Subquery Scan on b (cost=1.00..999.00 rows=100000 width=12)
Output: b.c2, b."Year"
-> Foreign Scan (cost=1.00..-1.00 rows=100000 width=12)
Output: fontime_1."Year", (count(*))
Relations: Aggregate on (fontime)
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)
14-savol: ORDER BY bandi bilan so‘rov
bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b ON a."Year"= b."Year"
ORDER BY a."Year";
14-savol: ORDER BY bandi bilan so'rovlar rejasi
QUERY PLAN
Merge Join (cost=2.00..628498.02 rows=50000000 width=12)
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))
Inner Unique: true Merge Cond: (fontime."Year" = fontime_1."Year")
-> GroupAggregate (cost=1.00..499.01 rows=1 width=12)
Output: fontime."Year", (count(*) * 1000)
Group Key: fontime."Year"
-> Foreign Scan on public.fontime (cost=1.00..-1.00 rows=100000 width=4)
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10))
ORDER BY "Year" ASC
-> GroupAggregate (cost=1.00..499.01 rows=1 width=12)
Output: fontime_1."Year", count(*) Group Key: fontime_1."Year"
-> Foreign Scan on public.fontime fontime_1 (cost=1.00..-1.00 rows=100000 width=4)
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)
xulosa
Ushbu tajribalar natijalari shuni ko'rsatadiki, ClickHouse haqiqatan ham yaxshi ishlashni taklif qiladi va clickhousedb_fdw PostgreSQL-dan ClickHouse-ning ishlash afzalliklarini taklif qiladi. Clickhousedb_fdw-dan foydalanishda biroz qo'shimcha xarajatlar mavjud bo'lsa-da, bu ahamiyatsiz va ClickHouse ma'lumotlar bazasida mahalliy ishlash orqali erishilgan samaradorlik bilan solishtirish mumkin. Bu, shuningdek, PostgreSQL-dagi fdw ajoyib natijalarni taqdim etishini tasdiqlaydi.
Clickhouse orqali Telegram suhbati
PostgreSQL yordamida Telegram suhbati
Manba: www.habr.com