Kev ntsuam xyuas kev ua tau zoo ntawm kev txheeb xyuas cov lus nug hauv PostgreSQL, ClickHouse thiab clickhousedb_fdw (PostgreSQL)

Hauv txoj kev tshawb no, kuv xav pom qhov kev txhim kho kev ua tau zoo tuaj yeem ua tiav los ntawm kev siv ClickHouse cov ntaub ntawv es tsis yog PostgreSQL. Kuv paub cov txiaj ntsig kev tsim khoom uas kuv tau txais los ntawm kev siv ClickHouse. Cov txiaj ntsig puas yuav txuas ntxiv yog tias kuv nkag mus rau ClickHouse los ntawm PostgreSQL siv Foreign Data Wrapper (FDW)?

Cov chaw khaws ntaub ntawv kawm yog PostgreSQL v11, clickhousedb_fdw thiab ClickHouse database. Thaum kawg, los ntawm PostgreSQL v11 peb yuav tau khiav ntau yam SQL queries routed los ntawm peb clickhousedb_fdw mus rau ClickHouse database. Peb mam li pom yuav ua li cas FDW qhov kev ua tau zoo piv rau tib cov lus nug uas khiav hauv ib txwm PostgreSQL thiab haiv neeg ClickHouse.

Clickhouse Database

ClickHouse yog qhov qhib qhov chaw columnar database tswj system uas tuaj yeem ua tiav qhov kev ua tau zoo 100-1000 lub sij hawm sai dua li cov txheej txheem ib txwm siv, muaj peev xwm ua tiav ntau tshaj li ib txhiab kab hauv tsawg dua ib ob.

Clickhousedb_fdw

clickhousedb_fdw - Cov ntaub ntawv sab nraud wrapper rau ClickHouse database, lossis FDW, yog qhov qhib qhov project los ntawm Percona. Nov yog qhov txuas mus rau qhov project GitHub repository.

Thaum Lub Peb Hlis Kuv tau sau ib qhov blog uas qhia koj ntxiv txog peb FDW.

Raws li koj yuav pom, qhov no muab FDW rau ClickHouse uas tso cai SELECT los ntawm, thiab INSERT INTO, ClickHouse database los ntawm PostgreSQL v11 server.

FDW txhawb nqa cov yam ntxwv zoo xws li kev sib sau ua ke thiab koom nrog. Qhov no txhim kho kev ua tau zoo los ntawm kev siv cov peev txheej ntawm cov chaw taws teeb server rau cov haujlwm siv nyiaj ntau.

Benchmark ib puag ncig

  • Supermicro server:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Nco: 256GB ntawm RAM
    • Cia: Samsung SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: version 11

Kev ntsuas ntsuas

Hloov chaw siv qee cov ntaub ntawv tsim los ntawm lub tshuab tsim rau qhov kev sim no, peb siv cov ntaub ntawv "Productivity by Time Reported Operator Time" los ntawm 1987 txog 2018. Koj tuaj yeem nkag mus rau cov ntaub ntawv siv peb tsab ntawv muaj nyob ntawm no.

Cov ntaub ntawv loj yog 85 GB, muab ib lub rooj ntawm 109 kab.

Benchmark Queries

Nov yog cov lus nug uas kuv tau siv los sib piv ClickHouse, clickhousedb_fdw thiab PostgreSQL.

Q#
Cov lus nug muaj cov Aggregates thiab Pab Pawg Los ntawm

Q1
SELECT DayOfWeek, suav(*) AS c NTAWM lub sij hawm nyob qhov twg xyoo >= 2000 THIAB Xyoo <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q2
SELECT DayOfWeek, suav(*) AS c NTAWM lub sijhawm nyob qhov twg DepDelay>10 THIAB Xyoo>= 2000 THIAB Xyoo <= 2008 Pab Pawg Los Ntawm DayOfWeek ORDER BY c DESC;

Q3
SELECT Keeb Kwm, suav(*) AS c NTAWM lub sijhawm nyob qhov twg DepDelay>10 THIAB Xyoo>= 2000 THIAB Xyoo <= 2008 Pab Pawg Los Ntawm Keeb Kwm ORDER BY c DESC LIMIT 10;

Q4
SELECT Carrier, suav() Los ntawm lub sijhawm nyob qhov twg DepDelay> 10 THIAB Xyoo = 2007 GROUP BY Carrier ORDER BY count() DESC;

Q5
SELECT a.Carrier, c, c2, c1000/c2 as c3 NTAWM (SELECT Carrier, suav() Raws li c NTAWM lub sijhawm nyob qhov twg DepDelay> 10 THIAB Xyoo = 2007 GROUP BY Carrier ) ib qho kev koom nrog sab hauv ( SELECT Carrier, suav(*) AS c2 NTAWM lub sijhawm nyob qhov twg Xyoo = 2007 GROUP BY Carrier)b ntawm a.Carrier = b.Carrier Los ntawm c3 DESC;

Q6
SELECT a.Carrier, c, c2, c1000/c2 as c3 NTAWM (SELECT Carrier, suav() Raws li c NTAWM lub sij hawm nyob qhov twg DepDelay>10 THIAB Xyoo>= 2000 THIAB Xyoo <= 2008 GROUP BY Carrier) ib qho kev koom nrog sab hauv ( SELECT Carrier, suav(*) AS c2 NTAWM lub sij hawm nyob qhov twg xyoo >= 2000 THIAB Xyoo <= 2008 GUPBY 3 Carrier ) b ntawm a.Carrier = b.Carrier ORDER BY cXNUMX DESC;

Q7
SELECT Carrier, avg(DepDelay) * 1000 AS c3 NTAWM ontime WHERE Year >= 2000 THIAB Xyoo <= 2008 GROUP BY Carrier;

Q8
SELECT Xyoo, avg(DepDelay) Los ntawm lub sij hawm pab pawg los ntawm Xyoo;

Q9
xaiv Xyoo, suav (*) raws li c1 los ntawm pab pawg neeg nyob rau lub xyoo;

Q10
SELECT avg(cnt) NTAWM (XAIV Xyoo, Hli, suav(*) AS cnt NTAWM ontime WHERE DepDel15=1 Pab Pawg Los Ntawm Xyoo, Hli) a;

Q11
xaiv avg(c1) los ntawm (xaiv Xyoo, Lub Hlis, suav(*) raws li c1 los ntawm pawg neeg nyob rau lub sijhawm los ntawm Xyoo, Lub Hlis) a;

Q12
SELECT OriginCityName, DestCityName, suav(*) AS c NTAWM ontime GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10;

Q13
SELECT OriginCityName, suav(*) AS c NTAWM lub sij hawm pab pawg los ntawm OriginCityName ORDER BY c DESC LIMIT 10;

Cov lus nug muaj koom nrog

Q14
SELECT a.Year, c1/c2 NTAWM (xaiv Xyoo, suav()1000 as c1 from ontime WHERE DepDelay> 10 GROUP BY Year) ib qho kev koom nrog sab hauv (xaiv Xyoo, suav (*) raws li c2 los ntawm ontime GROUP BY Xyoo) b ntawm a.Year=b.Year ORDER BY a.Year;

Q15
Xaiv ib "Year", c1/c2 NTAWM (xaiv "Xyoo", suav()1000 as c1 FROM fontime where “DepDelay”> 10 GROUP BY “Xyoo”) ib qho kev koom nrog sab hauv (xaiv “Xyoo”, suav (*) raws li c2 NTAWM fontime GROUP BY “Xyoo”) b ntawm a”Year”=b. "Xyoo";

Table-1: Cov lus nug siv nyob rau hauv kev ntsuas

Cov lus nug executions

Nov yog cov txiaj ntsig ntawm txhua qhov kev nug thaum khiav hauv qhov chaw sib txawv: PostgreSQL nrog thiab tsis muaj kev ntsuas, haiv neeg ClickHouse thiab clickhousedb_fdw. Lub sij hawm qhia nyob rau hauv milliseconds.

Q#
PostgreSQL
PostgreSQL (Indexed)
Nyem Tsev
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Table-1: Lub sijhawm siv los ua cov lus nug uas siv rau hauv cov qauv ntsuas

Saib cov txiaj ntsig

Daim duab qhia cov lus nug ua tiav lub sijhawm hauv milliseconds, X axis qhia tus lej nug los ntawm cov lus saum toj no, thiab Y axis qhia lub sijhawm ua tiav hauv milliseconds. ClickHouse cov txiaj ntsig thiab cov ntaub ntawv rov qab los ntawm postgres siv clickhousedb_fdw tau qhia. Los ntawm lub rooj koj tuaj yeem pom tias muaj qhov sib txawv loj ntawm PostgreSQL thiab ClickHouse, tab sis qhov sib txawv tsawg kawg ntawm ClickHouse thiab clickhousedb_fdw.

Kev ntsuam xyuas kev ua tau zoo ntawm kev txheeb xyuas cov lus nug hauv PostgreSQL, ClickHouse thiab clickhousedb_fdw (PostgreSQL)

Daim duab no qhia qhov txawv ntawm ClickhouseDB thiab clickhousedb_fdw. Hauv cov lus nug feem ntau, FDW nyiaj siv ua haujlwm tsis yog siab heev thiab tsis tshua muaj txiaj ntsig tshwj tsis yog Q12. Cov lus nug no suav nrog kev koom nrog thiab ORDER BY clause. Vim yog ORDER BY GROUP/BY clause, ORDER BY tsis poob rau ClickHouse.

Hauv Table 2 peb pom lub sijhawm dhia hauv cov lus nug Q12 thiab Q13. Ntxiv dua thiab, qhov no yog tshwm sim los ntawm ORDER BY clause. Txhawm rau kom paub meej qhov no, kuv tau khiav cov lus nug Q-14 thiab Q-15 nrog thiab tsis muaj ORDER BY clause. Yog tsis muaj ORDER BY clause lub sij hawm ua tiav yog 259ms thiab nrog ORDER BY clause nws yog 1364212. Txhawm rau debug cov lus nug no kuv piav qhia ob qho lus nug thiab ntawm no yog cov txiaj ntsig ntawm kev piav qhia.

Q15: Tsis muaj ORDER BY Clause

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Lus nug yam tsis tau txiav txim los ntawm nqe lus

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Nug nrog ORDER BY Clause

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Lus nug Plan nrog ORDER BY Clause

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

xaus

Cov txiaj ntsig ntawm cov kev sim no qhia tau tias ClickHouse muaj kev ua tau zoo tiag tiag, thiab clickhousedb_fdw muab cov txiaj ntsig kev ua tau zoo ntawm ClickHouse los ntawm PostgreSQL. Thaum muaj qee qhov nyiaj siv ua haujlwm thaum siv clickhousedb_fdw, nws yog qhov tsis txaus ntseeg thiab piv rau qhov kev ua tiav los ntawm kev khiav ib txwm nyob ntawm ClickHouse database. Qhov no kuj lees paub tias fdw hauv PostgreSQL muab cov txiaj ntsig zoo.

Telegram tham ntawm Clickhouse https://t.me/clickhouse_ru
Telegram tham siv PostgreSQL https://t.me/pgsql

Tau qhov twg los: www.hab.com

Ntxiv ib saib