Gwada aikin tambayoyin nazari a cikin PostgreSQL, ClickHouse da clickhousedb_fdw (PostgreSQL)

A cikin wannan binciken, Ina so in ga abin da za a iya samun ingantaccen aiki ta amfani da tushen bayanan ClickHouse maimakon PostgreSQL. Na san fa'idodin yawan aiki da nake samu ta amfani da ClickHouse. Shin waɗannan fa'idodin za su ci gaba idan na sami damar ClickHouse daga PostgreSQL ta amfani da Wrapper Data na Waje (FDW)?

Wurin bayanan da aka yi nazari sune PostgreSQL v11, clickhousedb_fdw da ClickHouse database. Daga ƙarshe, daga PostgreSQL v11 za mu gudanar da tambayoyi daban-daban na SQL waɗanda aka zazzage ta cikin clickhousedb_fdw zuwa bayanan ClickHouse. Za mu ga yadda aikin FDW ya kwatanta da tambayoyin iri ɗaya da ke gudana a cikin PostgreSQL na asali da kuma ClickHouse na asali.

Clickhouse Database

ClickHouse shine tsarin gudanar da bayanai na tushen tushen tushen tushe wanda zai iya cimma aiki sau 100-1000 cikin sauri fiye da hanyoyin bayanan gargajiya, mai ikon sarrafa sama da layuka biliyan a cikin kasa da dakika.

Clickhousedb_fdw

clickhousedb_fdw - Rubutun bayanan waje na ClickHouse database, ko FDW, wani buɗaɗɗen tushen aikin ne daga Percona. Anan akwai hanyar haɗi zuwa ma'ajin GitHub na aikin.

A watan Maris na rubuta bulogi da ke ba ku ƙarin bayani game da FDW ɗinmu.

Kamar yadda za ku gani, wannan yana ba da FDW don ClickHouse wanda ke ba da damar SELECT daga, da INSERT INTO, ma'aunin ClickHouse daga sabar PostgreSQL v11.

FDW tana goyan bayan abubuwan ci-gaba kamar tarawa da haɗawa. Wannan yana haɓaka aiki sosai ta amfani da albarkatun uwar garken nesa don waɗannan ayyuka masu ƙarfi na albarkatu.

Muhallin ma'auni

  • uwar garken Supermicro:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 soket / 28 cores / 56 zaren
    • Waƙwalwar ajiya: 256GB na RAM
    • Adana: Samsung SM863 1.9TB Enterprise SSD
    • Tsarin fayil: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: sigar 11

Gwajin gwaji

Maimakon yin amfani da wasu bayanan da aka samar da na'ura don wannan gwajin, mun yi amfani da bayanan "Productivity by Time Reported Operator Time" daga 1987 zuwa 2018. Kuna iya samun damar bayanan ta amfani da rubutun mu da ke nan.

Girman bayanan bayanai shine 85 GB, yana samar da tebur ɗaya na ginshiƙai 109.

Tambayoyin Benchmark

Ga tambayoyin da na yi amfani da su don kwatanta ClickHouse, clickhousedb_fdw da PostgreSQL.

Q#
Tambaya ta ƙunshi Tari da Rukuni Ta

Q1
Zabi RanaOfMako, ƙidaya(*) AS c DAGA kan lokaci INA Shekara>= 2000 DA Shekara <= 2008 GROUP BY DayOfWeek oda ta c DESC;

Q2
Zabi RanaOfMako, ƙidaya(*) AS c DAGA kan lokaci INA DepDelay>10 DA Shekara>= 2000 DA Shekara <= 2008 GURUFIN DA DayOfWeek oda BY c DESC;

Q3
Zabi Asalin, ƙidaya (*) AS c DAGA kan lokaci INA DepDelay>10 DA Shekara>= 2000 DA Shekara <= 2008 GROUP BY Asalin Oda BY c DESC LIMIT 10;

Q4
Zabi Mai ɗauka, ƙidaya() DAGA lokaci-lokaci INA DepDelay>10 DA Shekara = 2007 GROUP BY DOKOKI ODAR LITTAFI (ƙidaya)) DESC;

Q5
Zabi a. Mai ɗauka, c, c2, c1000/c2 a matsayin c3 DAGA (Zaɓi Mai ɗaukar kaya, ƙidaya() AS c DAGA lokaci-lokaci INA DepDelay>10 DA Shekara = 2007 GROUP BY Daukewa) Haɗin Ciki (Zaɓi Mai ɗaukar kaya, ƙidaya (*) AS c2 DAGA lokaci INA Shekara = 2007 GROUP BY Mai ɗaukar kaya) b akan a.Daukewa = b. OMARDIN DAMU BY c3 DESC;

Q6
Zabi a. Mai ɗauka, c, c2, c1000/c2 a matsayin c3 DAGA (Zaɓi Mai ɗaukar kaya, ƙidaya(AS c DAGA kan lokaci INA DepDelay>10 DA Shekara>= 2000 DA Shekara <= 2008 GROUP BY Daukewa) a Ciki JIN (Zabi Mai ɗaukar kaya, ƙidaya (*) AS c2 DAGA kan lokaci INA Shekara>= 2000 DA Shekara <= 2008 GROUP BY Mai ɗaukar kaya) b akan a.Daukewa=b.OKAMAR Ɗauka ta c3 DESC;

Q7
Zabi Mai ɗaukar kaya, avg(DepDelay) * 1000 AS c3 DAGA kan lokaci INA Shekara>= 2000 DA Shekara <= 2008 GROUP BY DUNIYA;

Q8
ZABEN SHEKARA, matsakaita (DepDelay) DAGA RUKUNAN lokaci na shekara;

Q9
zaɓi Shekara, ƙidaya (*) azaman c1 daga rukunin lokaci ta shekara;

Q10
SELECT avg(cnt) DAGA (Zabi Shekara, Watan, ƙidaya(*) AS cnt DAGA kan lokaci INA DepDel15=1 GROUP BY Shekara, Watan) a;

Q11
zaɓi avg(c1) daga (zaɓi Shekara, Wata, ƙidaya (*) azaman c1 daga rukunin lokaci ta shekara, Watan) a;

Q12
Zaɓi Asalin Sunan City, Sunan DestCity, ƙidaya(*) AS c DAGA K'UNGIYAR lokaci ta Asalin Sunan, DESTCitySunan OMARNI TA c DESC LIMIT 10;

Q13
Zaɓi Asalin Sunan, ƙidaya(*) AS c DAGA K'UNGIYAR lokaci ta Asalin Sunan Oda ta c DESC LIMIT 10;

Tambaya Ya ƙunshi Haɗuwa

Q14
ZAB a. Year, c1/c2 DAGA (zaɓa Shekara, ƙidaya()1000 a matsayin c1 daga kan lokaci INA DepDelay>10 GROUP BY Shekara) shiga ciki (zaɓa Shekara, ƙidaya (*) a matsayin c2 daga kan lokaci GROUP BY Shekara) b a kan a.Year=b.Shekara TAKARDAR ODAR XNUMXADXNUMX ZAMA AIKATA?

Q15
Zaɓi a."Shekara", c1/c2 DAGA (zaɓa "Shekara", ƙidaya()1000 as c1 DAGA fontime INA "DepDelay"> 10 GROUP BY "Shekara") haɗewar ciki (zaɓi "Shekara", ƙidaya (*) azaman c2 DAGA GROUP ɗin fontime BY "Shekara" b akan a."Shekara"=b. "Shekara";

Table-1: Tambayoyin da aka yi amfani da su a cikin ma'auni

Hukuncin kisa

Anan akwai sakamakon kowane ɗayan tambayoyin lokacin gudana a cikin saitunan bayanai daban-daban: PostgreSQL tare da kuma ba tare da fihirisa ba, ClickHouse na asali da clickhousedb_fdw. Ana nuna lokaci a cikin millise seconds.

Q#
PostgreSQL
PostgreSQL (Indexed)
DannaHause
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Table-1: Lokacin da aka ɗauka don aiwatar da tambayoyin da aka yi amfani da su a cikin ma'auni

Duba sakamakon

Hoton yana nuna lokacin aiwatar da tambaya a cikin millise seconds, axis X yana nuna lambar tambaya daga teburin da ke sama, kuma axis Y yana nuna lokacin aiwatarwa a cikin millise seconds. Ana nuna sakamakon ClickHouse da bayanan da aka samo daga postgres ta amfani da clickhousedb_fdw. Daga teburin za ku iya ganin cewa akwai babban bambanci tsakanin PostgreSQL da ClickHouse, amma ƙaramin bambanci tsakanin ClickHouse da clickhousedb_fdw.

Gwada aikin tambayoyin nazari a cikin PostgreSQL, ClickHouse da clickhousedb_fdw (PostgreSQL)

Wannan jadawali yana nuna bambanci tsakanin ClickhouseDB da clickhousedb_fdw. A yawancin tambayoyin, FDW sama da sama ba ta da girma kuma ba ta da mahimmanci sai na Q12. Wannan tambayar ta haɗa da haɗin kai da Oda ta hanyar magana. Saboda ORDER BY GROUP/BY magana, ORDER BY baya sauke zuwa ClickHouse.

A cikin tebur 2 muna ganin tsallen lokaci a cikin tambayoyin Q12 da Q13. Har ila yau, wannan yana faruwa ne ta hanyar ORDER BY sashi. Don tabbatar da wannan, na gudanar da tambayoyin Q-14 da Q-15 tare da kuma ba tare da ORDER BY sashe ba. Idan ba tare da ORDER BY sashe ba lokacin kammala shine 259ms kuma tare da ORDER BY sashi shine 1364212. Don cire wannan tambayar ina bayanin duka tambayoyin kuma ga sakamakon bayanin.

Q15: Ba tare da oda ta Magana ba

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Tambayoyi Ba tare da Oda ta Magana ba

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Tambayoyi Tare da Oda ta Magana

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Tsare-tsaren Tambayoyi tare da Oda ta Magana

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

ƙarshe

Sakamakon waɗannan gwaje-gwajen sun nuna cewa ClickHouse yana ba da kyakkyawan aiki sosai, kuma clickhousedb_fdw yana ba da fa'idodin aikin ClickHouse daga PostgreSQL. Duk da yake akwai wasu sama-sama yayin amfani da clickhousedb_fdw, yana da sakaci da kwatankwacin aikin da aka samu ta hanyar aiki ta asali akan ma'aunin ClickHouse. Wannan kuma yana tabbatar da cewa fdw a cikin PostgreSQL yana ba da kyakkyawan sakamako.

Tattaunawar Telegram ta hanyar Clickhouse https://t.me/clickhouse_ru
Tattaunawar Telegram ta amfani da PostgreSQL https://t.me/pgsql

source: www.habr.com

Add a comment