PostgreSQL, ClickHouse va clickhousedb_fdw (PostgreSQL) da analitik so'rovlarning ishlashini sinab ko'rish

Ushbu tadqiqotda men PostgreSQL emas, balki ClickHouse ma'lumotlar manbasidan foydalanish orqali qanday ishlash yaxshilanishiga erishish mumkinligini ko'rmoqchi edim. Men ClickHouse-dan foydalanishdan olingan samaradorlikni bilaman. Chet el ma'lumotlarini o'rash vositasi (FDW) yordamida PostgreSQL-dan ClickHouse-ga kirsam, bu imtiyozlar davom etadimi?

O'rganilgan ma'lumotlar bazasi muhitlari PostgreSQL v11, clickhousedb_fdw va ClickHouse ma'lumotlar bazasi. Oxir-oqibat, PostgreSQL v11 dan biz clickhousedb_fdw orqali ClickHouse ma'lumotlar bazasiga yo'naltirilgan turli xil SQL so'rovlarini bajaramiz. Keyin biz FDW ning ishlashi mahalliy PostgreSQL va mahalliy ClickHouse-da ishlaydigan bir xil so'rovlar bilan qanday solishtirilishini ko'rib chiqamiz.

Clickhouse ma'lumotlar bazasi

ClickHouse ochiq manbali ustunli ma'lumotlar bazasini boshqarish tizimi bo'lib, unumdorlikka an'anaviy ma'lumotlar bazasi yondashuvlariga qaraganda 100-1000 marta tezroq erisha oladi va bir soniyadan kamroq vaqt ichida milliarddan ortiq qatorlarni qayta ishlashga qodir.

Clickhousedb_fdw

clickhousedb_fdw - ClickHouse ma'lumotlar bazasi yoki FDW uchun tashqi ma'lumotlar o'rami Percona'dan ochiq manbali loyihadir. Bu yerda loyihaning GitHub omboriga havola.

Mart oyida men FDW haqida ko'proq ma'lumot beradigan blog yozdim.

Ko'rib turganingizdek, bu PostgreSQL v11 serveridan ClickHouse ma'lumotlar bazasidan SELECT va INSERT INTO imkonini beruvchi ClickHouse uchun FDW beradi.

FDW birlashtirish va qo'shilish kabi ilg'or xususiyatlarni qo'llab-quvvatlaydi. Bu resurs talab qiladigan operatsiyalar uchun masofaviy server resurslaridan foydalanish orqali unumdorlikni sezilarli darajada yaxshilaydi.

Benchmark muhiti

  • Supermicro server:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 ta rozetka / 28 yadro / 56 ta ip
    • Xotira: 256 GB operativ xotira
    • Saqlash: Samsung SM863 1.9TB Enterprise SSD
    • Fayl tizimi: ext4/xfs
  • OT: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: 11-versiya

Benchmark testlari

Ushbu test uchun ba'zi bir mashina tomonidan yaratilgan ma'lumotlar to'plamidan foydalanish o'rniga, biz 1987 yildan 2018 yilgacha "Operator vaqti bo'yicha hisobot qilingan mahsuldorlik" ma'lumotlaridan foydalandik. Siz ma'lumotlarga kirishingiz mumkin bu yerda mavjud bo'lgan skriptimizdan foydalaning.

Ma'lumotlar bazasi hajmi 85 GB bo'lib, 109 ustundan iborat bitta jadvalni ta'minlaydi.

Benchmark so'rovlari

Mana men ClickHouse, clickhousedb_fdw va PostgreSQLni solishtirish uchun foydalangan so'rovlar.

Q#
So'rovda agregatlar va guruhlash bo'yicha

Q1
DayOfWeek, hisoblash(*) AS c AS o'z vaqtida Yil >= 2000 VA Yil <= 2008 Haftalik kun BO'YICHA GURUHGA TARTIB C TARTIBI BO'YICHA;

Q2
DayOfWeek TANLASH, (*) AS C AS QAYERDA DepDelay>10 VA Yil >= 2000 VA Yil <= 2008 Haftalik DayOf BY TARTIBI BO'YICHA GURUHLASH;

Q3
Kelib chiqishini tanlang, hisoblang(*) AS c FROM o'z vaqtida QAYERDA DepDelay>10 VA Yil >= 2000 VA YIL <= 2008 Kelib chiqishi bo'yicha GRUP TARTIB BY c TASHLAM LIMIT 10;

Q4
Tashuvchini tanlang, hisoblash() FROM o'z vaqtida QAYER DepDelay>10 VA Yil = 2007 Guruh tashuvchi bo'yicha TARTIB BY RO'YXAT() DESC;

Q5
SELECT a.Tashuvchi, c, c2, c1000/c2 sifatida c3 FROM (tashuvchini tanlang, hisoblash() AS c o'z vaqtida QAYERDA DepDelay>10 VA Yil=2007 GROUP BY Carrier ) a INNER JOIN ( SELECT Carrier,count(*) AS c2 FROM FROM on time WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC;

Q6
SELECT a.Tashuvchi, c, c2, c1000/c2 sifatida c3 FROM (tashuvchini tanlang, hisoblash() AS c o'z vaqtida QAYERDA DepDelay>10 VA Yil >= 2000 VA YIL <= 2008 GROUP BY Carrier) a INTER JOIN ( Operatorni tanlang, (*) AS C2 O'Z vaqtidan QAYER YIL >= 2000 VA YIL <= 2008 GRO UP Carrier ) b bo'yicha a.Carrier=b.Carrier ORDER BY c3 DESC;

Q7
Tashuvchini tanlang, avg(DepDelay) * 1000 AS c3 o'z vaqtida QAYERDA Yil >= 2000 VA YIL <= 2008 Tashuvchi bo'yicha GURUH;

Q8
Yilni SELECT, oʻrtacha(DepDelay) Oʻz vaqtida GROUP BY YILDAN;

Q9
Yilni tanlang, Yil bo'yicha vaqt guruhidan c1 sifatida (*) hisoblang;

Q10
SELECT avg(cnt) FROM (Yil,Oy,count(*) AS cnt FROM o'z vaqtida TANLASH QERDA DepDel15=1 GROUP YIL,OY) a;

Q11
dan avg(c1) ni tanlang (Yil,Oy,son(*) ni c1 sifatida Yil,Oy bo‘yicha ish vaqti guruhidan tanlang) a;

Q12
OriginCityName, DestCityName, count(*) AS C FROM OriginCityName, DestCityName TARTIBI BO‘YICHA TARTIBI 10;

Q13
OriginCityName ni tanlang, hisoblang (*) c AS C FROM OriginCityName BO'YICHA GROUP BY C DESC LIMIT 10;

So'rov qo'shilishlarni o'z ichiga oladi

Q14
a.Yil, c1/c2 FROM ni tanlang (Yil, hisobni tanlang()1000 dan c1 sifatida QAYERDA DepDelay>10 GROUP BY YIL) a INNER JOIN (Yilni tanlang, hisoblash(*) c2 sifatida o'z vaqtida GROUP BY YIL ) b on a.Year=b.Year ORD BY a.Year;

Q15
a.”Yil”, c1/c2 FROM ( “Yil”ni tanlang, hisoblash()1000 as c1 fontime FROM WHERE “DepDelay”>10 GROUP BY “Year”) a INNER JOIN (“Yil” ni tanlang, c2 sifatida hisoblash(*) FROM fontime GROUP BY “Year” ) b on a.”Year”=b. "Yil";

Jadval-1: Benchmarkda foydalanilgan so'rovlar

So'rovlarni bajarish

Turli ma'lumotlar bazasi sozlamalarida ishga tushirilganda so'rovlarning har birining natijalari quyida keltirilgan: indeksli va indekssiz PostgreSQL, mahalliy ClickHouse va clickhousedb_fdw. Vaqt millisekundlarda ko'rsatilgan.

Q#
PostgreSQL
PostgreSQL (indekslangan)
ClickHouse
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Jadval-1: Benchmarkda foydalanilgan so'rovlarni bajarish uchun sarflangan vaqt

Natijalarni ko'rish

Grafik so'rovni bajarish vaqtini millisekundlarda, X o'qi yuqoridagi jadvallardagi so'rovlar raqamini, Y o'qi esa millisekundlarda bajarilish vaqtini ko'rsatadi. ClickHouse natijalari va postgresdan clickhousedb_fdw yordamida olingan ma'lumotlar ko'rsatilgan. Jadvaldan PostgreSQL va ClickHouse o'rtasida katta farq borligini ko'rishingiz mumkin, ammo ClickHouse va clickhousedb_fdw o'rtasida minimal farq bor.

PostgreSQL, ClickHouse va clickhousedb_fdw (PostgreSQL) da analitik so'rovlarning ishlashini sinab ko'rish

Ushbu grafik ClickhouseDB va clickhousedb_fdw o'rtasidagi farqni ko'rsatadi. Ko'pgina so'rovlarda FDW qo'shimcha xarajatlari unchalik yuqori emas va 12-chorakdan tashqari deyarli ahamiyatli emas. Bu soʻrov birlashmalarni va ORDER BY bandini oʻz ichiga oladi. ORDER BY GROUP/BY bandi tufayli ORDER BY tugmasi ClickHouse-ga tushmaydi.

2-jadvalda biz Q12 va Q13 so'rovlarida vaqt sakrashini ko'ramiz. Shunga qaramay, bu ORDER BY bandidan kelib chiqadi. Buni tasdiqlash uchun men Q-14 va Q-15 so'rovlarini ORDER BY bandi bilan va holda bajardim. ORDER BY bandisiz bajarish vaqti 259ms, ORDER BY bandida esa 1364212 ga teng.

15-savol: ORDER BY bandisiz

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

15-savol: ORDER BY bandisiz so'rov

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

14-savol: ORDER BY bandi bilan so‘rov

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

14-savol: ORDER BY bandi bilan so'rovlar rejasi

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

xulosa

Ushbu tajribalar natijalari shuni ko'rsatadiki, ClickHouse haqiqatan ham yaxshi ishlashni taklif qiladi va clickhousedb_fdw PostgreSQL-dan ClickHouse-ning ishlash afzalliklarini taklif qiladi. Clickhousedb_fdw-dan foydalanishda biroz qo'shimcha xarajatlar mavjud bo'lsa-da, bu ahamiyatsiz va ClickHouse ma'lumotlar bazasida mahalliy ishlash orqali erishilgan samaradorlik bilan solishtirish mumkin. Bu, shuningdek, PostgreSQL-dagi fdw ajoyib natijalarni taqdim etishini tasdiqlaydi.

Clickhouse orqali Telegram suhbati https://t.me/clickhouse_ru
PostgreSQL yordamida Telegram suhbati https://t.me/pgsql

Manba: www.habr.com

a Izoh qo'shish