ื‘ื“ื™ืงืช ื”ื‘ื™ืฆื•ืขื™ื ืฉืœ ืฉืื™ืœืชื•ืช ืื ืœื™ื˜ื™ื•ืช ื‘-PostgreSQL, ClickHouse ื•-clickhousedb_fdw (PostgreSQL)

ื‘ืžื—ืงืจ ื–ื”, ืจืฆื™ืชื™ ืœืจืื•ืช ืื™ืœื• ืฉื™ืคื•ืจื™ื ื‘ื‘ื™ืฆื•ืขื™ื ื ื™ืชืŸ ืœื”ืฉื™ื’ ืขืœ ื™ื“ื™ ืฉื™ืžื•ืฉ ื‘ืžืงื•ืจ ื ืชื•ื ื™ื ืฉืœ ClickHouse ื•ืœื ื‘-PostgreSQL. ืื ื™ ืžื›ื™ืจ ืืช ื™ืชืจื•ื ื•ืช ื”ืคืจื•ื“ื•ืงื˜ื™ื‘ื™ื•ืช ืฉืื ื™ ืžืงื‘ืœ ืžื”ืฉื™ืžื•ืฉ ื‘-ClickHouse. ื”ืื ื”ื”ื˜ื‘ื•ืช ื”ืœืœื• ื™ืžืฉื™ื›ื• ืื ืื™ื’ืฉ ืœ-ClickHouse ืž-PostgreSQL ื‘ืืžืฆืขื•ืช ืขื˜ื™ืคืช ื ืชื•ื ื™ื ื–ืจื” (FDW)?

ืกื‘ื™ื‘ื•ืช ืžืกื“ ื”ื ืชื•ื ื™ื ื”ื ืœืžื“ื•ืช ื”ืŸ PostgreSQL v11, clickhousedb_fdw ื•ืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœ ClickHouse. ื‘ืกื•ืคื• ืฉืœ ื“ื‘ืจ, ืž-PostgreSQL v11 ื ืจื™ืฅ ืฉืื™ืœืชื•ืช SQL ืฉื•ื ื•ืช ื”ืžื ื•ืชื‘ื•ืช ื“ืจืš clickhousedb_fdw ืฉืœื ื• ืœืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœ ClickHouse. ืœืื—ืจ ืžื›ืŸ ื ืจืื” ื›ื™ืฆื“ ื”ื‘ื™ืฆื•ืขื™ื ืฉืœ FDW ืžืฉืชื•ื•ื™ื ืœืื•ืชืŸ ืฉืื™ืœืชื•ืช ื”ืคื•ืขืœื•ืช ื‘-PostgreSQL ื”ืžืงื•ืจื™ ื•ื‘-ClickHouse.

ืžืกื“ ื ืชื•ื ื™ื ืฉืœ ืงืœื™ืงื”ืื•ืก

ClickHouse ื”ื™ื ืžืขืจื›ืช ืงื•ื“ ืคืชื•ื— ืœื ื™ื”ื•ืœ ืžืกื“ื™ ื ืชื•ื ื™ื ืขืžื•ื“ื™ื ืฉื™ื›ื•ืœื” ืœื”ืฉื™ื’ ื‘ื™ืฆื•ืขื™ื ืžื”ื™ืจื™ื ืคื™ 100-1000 ืžื’ื™ืฉื•ืช ืžืกื“ ื ืชื•ื ื™ื ืžืกื•ืจืชื™ื•ืช, ื”ืžืกื•ื’ืœืช ืœืขื‘ื“ ืœืžืขืœื” ืžืžื™ืœื™ืืจื“ ืฉื•ืจื•ืช ื‘ืคื—ื•ืช ืžืฉื ื™ื™ื”.

Clickhousedb_fdw

clickhousedb_fdw - ืžืขื˜ืคืช ื”ื ืชื•ื ื™ื ื”ื—ื™ืฆื•ื ื™ืช ืขื‘ื•ืจ ืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœ ClickHouse, ืื• FDW, ื”ื•ื ืคืจื•ื™ืงื˜ ืงื•ื“ ืคืชื•ื— ืžื‘ื™ืช Percona. ื”ื ื” ืงื™ืฉื•ืจ ืœืžืื’ืจ GitHub ืฉืœ ื”ืคืจื•ื™ืงื˜.

ื‘ืžืจืฅ ื›ืชื‘ืชื™ ื‘ืœื•ื’ ืฉืžืกืคืจ ืœื›ื ื™ื•ืชืจ ืขืœ ื”-FDW ืฉืœื ื•.

ื›ืคื™ ืฉืชืจืื”, ื–ื” ืžืกืคืง FDW ืขื‘ื•ืจ ClickHouse ื”ืžืืคืฉืจ SELECT ืžืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœ ClickHouse ื•-INSERT INTO ืžืฉืจืช PostgreSQL v11.

FDW ืชื•ืžืš ื‘ืชื›ื•ื ื•ืช ืžืชืงื“ืžื•ืช ื›ื’ื•ืŸ ืฆื‘ื™ืจื” ื•ื”ืฆื˜ืจืคื•ืช. ื–ื” ืžืฉืคืจ ืžืฉืžืขื•ืชื™ืช ืืช ื”ื‘ื™ืฆื•ืขื™ื ืขืœ ื™ื“ื™ ืฉื™ืžื•ืฉ ื‘ืžืฉืื‘ื™ื ืฉืœ ื”ืฉืจืช ื”ืžืจื•ื—ืง ืœืคืขื•ืœื•ืช ืขืชื™ืจื•ืช ืžืฉืื‘ื™ื ืืœื•.

ืกื‘ื™ื‘ืช ื‘ื ืฆ'ืžืจืง

  • ืฉืจืช Supermicro:
    • Intelยฎ Xeonยฎ CPU E5-2683 v3 @ 2.00GHz
    • 2 ืฉืงืขื™ื / 28 ืœื™ื‘ื•ืช / 56 ื—ื•ื˜ื™ื
    • ื–ื™ื›ืจื•ืŸ: 256 ื’'ื™ื’ื” ื–ื™ื›ืจื•ืŸ RAM
    • ืื—ืกื•ืŸ: Samsung SM863 1.9TB Enterprise SSD
    • ืžืขืจื›ืช ืงื‘ืฆื™ื: ext4/xfs
  • ืžืขืจื›ืช ื”ืคืขืœื”: Linux smblade01 4.15.0-42-ื’ื ืจื™ #45~16.04.1-Ubuntu
  • PostgreSQL: ื’ืจืกื” 11

ื‘ื“ื™ืงื•ืช ื‘ื ืฆ'ืžืจืง

ื‘ืžืงื•ื ืœื”ืฉืชืžืฉ ื‘ืžืขืจืš ื ืชื•ื ื™ื ืฉื ื•ืฆืจ ืขืœ ื™ื“ื™ ืžื›ื•ื ื” ืขื‘ื•ืจ ื‘ื“ื™ืงื” ื–ื•, ื”ืฉืชืžืฉื ื• ื‘ื ืชื•ื ื™ "ืคืจื•ื“ื•ืงื˜ื™ื‘ื™ื•ืช ืœืคื™ ื–ืžืŸ ื“ื™ื•ื•ื— ืžืคืขื™ืœ ื–ืžืŸ" ืž-1987 ืขื“ 2018. ืืชื” ื™ื›ื•ืœ ืœื’ืฉืช ืœื ืชื•ื ื™ื ื‘ืืžืฆืขื•ืช ื”ืกืงืจื™ืคื˜ ืฉืœื ื• ื”ื–ืžื™ืŸ ื›ืืŸ.

ื’ื•ื“ืœ ืžืกื“ ื”ื ืชื•ื ื™ื ื”ื•ื 85 GB, ื”ืžืกืคืง ื˜ื‘ืœื” ืื—ืช ืฉืœ 109 ืขืžื•ื“ื•ืช.

ืฉืื™ืœืชื•ืช ื‘ื ืฆ'ืžืจืง

ืœื”ืœืŸ ื”ืฉืื™ืœืชื•ืช ืฉื”ืฉืชืžืฉืชื™ ื‘ื”ืŸ ื›ื“ื™ ืœื”ืฉื•ื•ืช ืืช ClickHouse, clickhousedb_fdw ื•-PostgreSQL.

Q#
ืฉืื™ืœืชื” ืžื›ื™ืœื” ืื’ืจื’ื˜ื™ื ื•ืงื‘ืฅ ืœืคื™

Q1
ื‘ื—ืจ DayOfWeek, count(*) AS c FROM ontime WHERE ืฉื ื” >= 2000 ื•ืฉื ื” <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q2
ื‘ื—ืจ DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 ื•ืฉื ื” >= 2000 ื•ืฉื ื” <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q3
SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 ื•ืฉื ื” >= 2000 ื•ืฉื ื” <= 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;

Q4
SELECT Carrier, count() FROM ontime WHERE DepDelay>10 ื•ืฉื ื” = 2007 ืงื‘ื•ืฆื” ืœืคื™ ืกืคืง ื”ื–ืžื ื” ืœืคื™ ืกืคื™ืจื”() DESC;

Q5
SELECT a.Carrier, c, c2, c1000/c2 ื›-c3 FROM ( SELECT Carrier, count() AS c FROM ontime WHERE DepDelay>10 ื•ืฉื ื”=2007 GROUP BY Carrier ) a INNER JOIN ( SELECT Carrier,count(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC;

Q6
SELECT a.Carrier, c, c2, c1000/c2 ื›-c3 FROM ( SELECT Carrier, count() AS c FROM ontime WHERE DepDelay>10 ื•ืฉื ื” >= 2000 ื•ืฉื ื” <= 2008 GROUP BY Carrier) a INNER JOIN (ื‘ื—ืจ ืกืคืง, ืกืคื™ืจื”(*) AS c2 FROM ontime WHERE ืฉื ื” >= 2000 ื•ืฉื ื” <= 2008 GROUP BY Carrier ) b on a.Carrier=b.Carrier ORDER BY c3 DESC;

Q7
SELECT Carrier, avg(DepDelay) * 1000 AS c3 FROM ontime WHERE ืฉื ื” >= 2000 ื•ืฉื ื” <= 2008 GROUP BY ืกืคืง;

Q8
ื‘ื—ืจ ืฉื ื”, avg(DepDelay) FROM ื‘ื–ืžืŸ GROUP BY Year;

Q9
ื‘ื—ืจ ืฉื ื”, ืกืคื™ืจื”(*) ื›-c1 ืžืงื‘ื•ืฆืช ื–ืžืŸ ืœืคื™ ืฉื ื”;

Q10
SELECT avg(cnt) FROM (SELECT Year,Month,count(*) AS cnt FROM ontime WHERE DepDel15=1 GROUP BY Year,Month) ื;

Q11
ื‘ื—ืจ avg(c1) from (ื‘ื—ืจ ืฉื ื”, ื—ื•ื“ืฉ, ืกืคื™ืจื”(*) ื›-c1 ืžืงื‘ื•ืฆืช ื–ืžืŸ ืœืคื™ ืฉื ื”, ื—ื•ื“ืฉ) ื;

Q12
SELECT OriginCityName, DestCityName, count(*) AS c FROM ontime GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10;

Q13
SELECT OriginCityName, count(*) AS c FROM ontime GROUP BY OriginCityName ORDER BY c DESC LIMIT 10;

ืฉืื™ืœืชื” ืžื›ื™ืœื” ื”ืฆื˜ืจืคื•ืช

Q14
SELECT a.Year, c1/c2 FROM (ื‘ื—ืจ ืฉื ื”, count()1000 ื›-c1 ืž-Ontime WHERE DepDelay>10 GROUP BY Year) a INNER JOIN (ื‘ื—ืจ ืฉื ื”, ืกืคื•ืจ(*) ื›-c2 ืž-Ontime GROUP BY Year ) b ืขืœ a.Year=b.Year ORDER BY a.Year;

Q15
SELECT a."Year", c1/c2 FROM (ื‘ื—ืจ "Year", count()1000 ื›-c1 FROM fontime WHERE โ€œDepDelayโ€>10 GROUP BY โ€œYearโ€) a INNER JOIN (ื‘ื—ืจื• โ€œYearโ€, ื—ืฉื‘ื•(*) ื›-c2 FROM fontime GROUP ืœืคื™ โ€œYearโ€ ) b on a.โ€Yearโ€=b. "ืฉืึธื ึธื”";

ื˜ื‘ืœื”-1: ืฉืื™ืœืชื•ืช ื‘ืฉื™ืžื•ืฉ ื‘-benchmark

ื‘ื™ืฆื•ืข ืฉืื™ืœืชื•ืช

ืœื”ืœืŸ ื”ืชื•ืฆืื•ืช ืฉืœ ื›ืœ ืื—ืช ืžื”ืฉืื™ืœืชื•ืช ื›ืฉื”ืŸ ืžื•ืคืขืœื•ืช ื‘ื”ื’ื“ืจื•ืช ืžืกื“ ื ืชื•ื ื™ื ืฉื•ื ื•ืช: PostgreSQL ืขื ื•ืœืœื ืื™ื ื“ืงืกื™ื, native ClickHouse ื•-clickhousedb_fdw. ื”ื–ืžืŸ ืžื•ืฆื’ ื‘ืืœืคื™ื•ืช ืฉื ื™ื•ืช.

Q#
PostgreSQL
PostgreSQL (ืื™ื ื“ืงืก)
ืงืœื™ืงื”ืื•ืก
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

ื˜ื‘ืœื”-1: ื”ื–ืžืŸ ืฉื ื“ืจืฉ ืœื‘ื™ืฆื•ืข ื”ืฉืื™ืœืชื•ืช ื”ืžืฉืžืฉื•ืช ื‘-benchmark

ืฆืคื” ื‘ืชื•ืฆืื•ืช

ื”ื’ืจืฃ ืžืฆื™ื’ ืืช ื–ืžืŸ ื‘ื™ืฆื•ืข ื”ืฉืื™ืœืชื” ื‘ืืœืคื™ื•ืช ืฉื ื™ื•ืช, ืฆื™ืจ X ืžืฆื™ื’ ืืช ืžืกืคืจ ื”ืฉืื™ืœืชื” ืžื”ื˜ื‘ืœืื•ืช ืฉืœืžืขืœื”, ื•ืฆื™ืจ Y ืžืฆื™ื’ ืืช ื–ืžืŸ ื”ื‘ื™ืฆื•ืข ื‘ืืœืคื™ื•ืช ืฉื ื™ื•ืช. ืžื•ืฆื’ื™ื ืชื•ืฆืื•ืช ClickHouse ื•ื ืชื•ื ื™ื ืฉืื•ื—ื–ืจื• ืž-postgres ื‘ืืžืฆืขื•ืช clickhousedb_fdw. ืžื”ื˜ื‘ืœื” ื ื™ืชืŸ ืœืจืื•ืช ืฉื™ืฉ ื”ื‘ื“ืœ ืขืฆื•ื ื‘ื™ืŸ PostgreSQL ืœ-ClickHouse, ืืš ื”ื‘ื“ืœ ืžื™ื ื™ืžืœื™ ื‘ื™ืŸ ClickHouse ืœ-clickhousedb_fdw.

ื‘ื“ื™ืงืช ื”ื‘ื™ืฆื•ืขื™ื ืฉืœ ืฉืื™ืœืชื•ืช ืื ืœื™ื˜ื™ื•ืช ื‘-PostgreSQL, ClickHouse ื•-clickhousedb_fdw (PostgreSQL)

ื’ืจืฃ ื–ื” ืžืฆื™ื’ ืืช ื”ื”ื‘ื“ืœ ื‘ื™ืŸ ClickhouseDB ืœ-clickhousedb_fdw. ื‘ืจื•ื‘ ื”ืฉืื™ืœืชื•ืช, ืชืงื•ืจื” ืฉืœ FDW ืื™ื ื” ื›ืœ ื›ืš ื’ื‘ื•ื”ื” ื•ื”ื™ื ื‘ืงื•ืฉื™ ืžืฉืžืขื•ืชื™ืช ืœืžืขื˜ Q12. ืฉืื™ืœืชื” ื–ื• ื›ื•ืœืœืช ื”ืฆื˜ืจืคื•ืช ื•ืกืขื™ืฃ ORDER BY. ื‘ื’ืœืœ ืกืขื™ืฃ ORDER BY GROUP/BY, ORDER BY ืœื ื™ื•ืจื“ ืœ-ClickHouse.

ื‘ื˜ื‘ืœื” 2 ืื ื• ืจื•ืื™ื ืืช ืงืคื™ืฆืช ื”ื–ืžืŸ ื‘ืฉืื™ืœืชื•ืช Q12 ื•-Q13. ืฉื•ื‘, ื–ื” ื ื’ืจื ืขืœ ื™ื“ื™ ืกืขื™ืฃ ORDER BY. ื›ื“ื™ ืœืืฉืจ ื–ืืช, ื”ืจืฆืชื™ ืฉืื™ืœืชื•ืช Q-14 ื•-Q-15 ืขื ื•ื‘ืœื™ ืกืขื™ืฃ ORDER BY. ืœืœื ืกืขื™ืฃ ORDER BY ื–ืžืŸ ื”ื”ืฉืœืžื” ื”ื•ื 259ms ื•ืขื ืกืขื™ืฃ ORDER BY ื”ื•ื 1364212. ื›ื“ื™ ืœื ืคื•ืช ื‘ืื’ื™ื ื‘ืฉืื™ืœืชื” ื–ื• ืื ื™ ืžืกื‘ื™ืจ ืืช ืฉืชื™ ื”ืฉืื™ืœืชื•ืช ื•ื”ื ื” ื”ืชื•ืฆืื•ืช ืฉืœ ื”ื”ืกื‘ืจ.

ืฉืืœื” 15: ืœืœื ืกืขื™ืฃ ORDER BY

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

ืฉืืœื” 15: ืฉืื™ืœืชื” ืœืœื ืกืขื™ืฃ ORDER BY

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

ืฉืืœื” 14: ืฉืื™ืœืชื” ืขื ืกืขื™ืฃ ORDER BY

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

ืฉืืœื” 14: ืชื•ื›ื ื™ืช ืฉืื™ืœืชื•ืช ืขื ืกืขื™ืฃ ORDER BY

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

ืคืœื˜

ืชื•ืฆืื•ืช ื”ื ื™ืกื•ื™ื™ื ื”ืœืœื• ืžืจืื•ืช ืฉ-ClickHouse ืžืฆื™ืข ื‘ื™ืฆื•ืขื™ื ื˜ื•ื‘ื™ื ื‘ืืžืช, ื•-clickhousedb_fdw ืžืฆื™ืข ืืช ื™ืชืจื•ื ื•ืช ื”ื‘ื™ืฆื•ืขื™ื ืฉืœ ClickHouse ืž-PostgreSQL. ืืžื ื ื™ืฉ ืชืงื•ืจื” ืžืกื•ื™ืžืช ื‘ืขืช ืฉื™ืžื•ืฉ ื‘-clickhousedb_fdw, ืืš ื”ื•ื ื–ื ื™ื— ื•ื ื™ืชืŸ ืœื”ืฉื•ื•ืื” ืœื‘ื™ืฆื•ืขื™ื ืฉื”ื•ืฉื’ื• ืขืœ ื™ื“ื™ ื”ืคืขืœื” ืžืงื•ืจื™ืช ืขืœ ืžืกื“ ื”ื ืชื•ื ื™ื ืฉืœ ClickHouse. ื–ื” ื’ื ืžืืฉืจ ืฉ-fdw ื‘-PostgreSQL ืžืกืคืง ืชื•ืฆืื•ืช ืžืฆื•ื™ื ื•ืช.

ืฆ'ืื˜ ื˜ืœื’ืจื ื“ืจืš ืงืœื™ืงื”ืื•ืก https://t.me/clickhouse_ru
ืฆ'ืื˜ ื‘ื˜ืœื’ืจื ื‘ืืžืฆืขื•ืช PostgreSQL https://t.me/pgsql

ืžืงื•ืจ: www.habr.com

ื”ื•ืกืคืช ืชื’ื•ื‘ื”