Kuedza kuita kwemibvunzo yekuongorora muPostgreSQL, ClickHouse uye clickhousedb_fdw (PostgreSQL)

Muchidzidzo ichi, ndaida kuona kuti ndedzipi kuvandudzwa kwekuita kwaigona kuwanikwa nekushandisa ClickHouse data sosi kwete PostgreSQL. Ndinoziva mabhenefiti ekubudirira andinowana kubva pakushandisa ClickHouse. Aya mabhenefiti achaenderera mberi here kana ndikawana ClickHouse kubva kuPostgreSQL ndichishandisa Foreign Data Wrapper (FDW)?

Nzvimbo dze database dzakadzidzwa ndeye PostgreSQL v11, clickhousedb_fdw uye ClickHouse dhatabhesi. Pakupedzisira, kubva kuPostgreSQL v11 tichange tichimhanyisa akasiyana SQL mibvunzo inofambiswa kuburikidza neyedu clickhousedb_fdw kune ClickHouse dhatabhesi. Tichazoona kuti kuita kweFDW kunofananidzwa sei nemibvunzo yakafanana inomhanya muPostgreSQL yemuno uye yemuno ClickHouse.

Clickhouse Database

ClickHouse ndeye yakavhurika sosi columnar dhatabhesi manejimendi system inokwanisa kuita 100-1000 nguva nekukurumidza kupfuura echinyakare dhatabhesi nzira, inokwanisa kugadzirisa inopfuura bhiriyoni mitsara isingasviki sekondi.

Clickhousedb_fdw

clickhousedb_fdw - Iyo yekunze data wrapper yeClickHouse dhatabhesi, kana FDW, chirongwa chakavhurika sosi kubva kuPercona. Heino chinongedzo kune purojekiti yeGitHub repository.

Muna Kurume ndakanyora bhurogu inokuudza zvimwe nezve FDW yedu.

Sezvauchaona, izvi zvinopa FDW yeClickHouse inobvumira SARUDZA kubva, uye INSERT INTO, iyo ClickHouse dhatabhesi kubva kuPostgreSQL v11 server.

FDW inotsigira maficha epamberi senge aggregate uye kujoinha. Izvi zvinonyanya kunatsiridza mashandiro nekushandisa zviwanikwa zvesevha iri kure kune aya maosheni-akanyanya mashandiro.

Benchmark nharaunda

  • Supermicro server:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 zvigadziko / 28 cores / 56 shinda
    • Memory: 256GB ye RAM
    • Kuchengetedza: Samsung SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: vhezheni 11

Benchmark bvunzo

Panzvimbo pekushandisa imwe yakagadzirwa nemuchina data seti yebvunzo iyi, isu takashandisa iyo "Kugadzirwa neNguva Yakarehwa Operator Nguva" data kubva 1987 kusvika 2018. Unogona kuwana iyo data tichishandisa script yedu iripo pano.

Saizi yedatabase i85 GB, ichipa tafura imwe yemakoramu zana.

Benchmark Mibvunzo

Heino mibvunzo yandaishandisa kuenzanisa ClickHouse, clickhousedb_fdw uye PostgreSQL.

Q#
Mubvunzo Une Aggregates uye Group By

Q1
SARUDZA DayOfWeek, count(*) AS c KUBVA nenguva HERE Year >= 2000 NEGore <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q2
SARUDZA DayOfWeek, count(*) AS c KUBVA nenguva HERE DepDelay>10 NEGore >= 2000 NEGore <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q3
SARUDZA Mabviro, kuverenga(*) AS c KUBVA nenguva HERE DepDelay>10 NEGore >= 2000 NEGore <= 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;

Q4
SARUDZA Mutakuri, kuverenga () KUBVA nenguva HERE DepDelay>10 AND Year = 2007 GROUP BY Carrier ORDER BY count() DESC;

Q5
SARUDZA a.Mutakuri, c, c2, c1000/c2 se c3 KUBVA (SARUDZA Mutakuri, kuverenga() AS c KUBVA nenguva HERE DepDelay>10 AND Year=2007 GROUP BY Carrier ) a INNER JOIN ( SELECT Carrier, count(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC;

Q6
SARUDZA a.Mutakuri, c, c2, c1000/c2 se c3 KUBVA (SARUDZA Mutakuri, kuverenga() AS c KUBVA nenguva HERE DepDelay>10 NEGore >= 2000 NEGore <= 2008 GROUP BY Carrier) a INNER JOIN ( SELECT Carrier, count(*) AS c2 FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY Mutakuri ) b pa.Mutakuri=b.Mutakuri ORDER BY c3 DESC;

Q7
SARUDZA Mutakuri, avg(DepDelay) * 1000 AS c3 KUBVA nenguva HERE Gore >= 2000 NEGore <= 2008 GROUP BY Carrier;

Q8
SARUDZA Gore, avg(DepDelay) KUBVA pane nguva GROUP BY Year;

Q9
sarudza Gore, count(*) se c1 kubva panguva yeboka neGore;

Q10
SARUDZA avg(cnt) KUBVA (SARUDZA Gore,Mwedzi,kuverenga(*) AS cnt KUBVA nenguva HERE DepDel15=1 BOKA NEGore,Mwedzi) a;

Q11
sarudza avg(c1) kubva (sarudza Gore,Mwedzi,kuverenga(*) se c1 kubva panguva yeboka neGore,Mwedzi) a;

Q12
SARUDZA OriginCityName, DestCityName, count(*) AS c KUBVA nenguva GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10;

Q13
SARUDZA OriginCityName, count(*) AS c KUBVA nenguva GROUP BY OriginCityName ORDER BY c DESC LIMIT 10;

Mubvunzo Une Majoini

Q14
SARUDZA a.Year, c1/c2 KUBVA (sarudza Gore, kuverenga()1000 as c1 from ontime WHERE DepDelay>10 GROUP BY Year) a INNER JOIN (sarudza Gore, count(*) as c2 kubva panguva GROUP BY Year ) b pa.Year=b.Year ORDER BY a.Year;

Q15
SARUDZA a.”Gore”, c1/c2 KUBVA (sarudza “Gore”, count()1000 as c1 KUBVA fontime PANE “DepDelay”>10 BOKA NE“Gore”) a INNER JOIN (sarudza “Gore”, count(*) as c2 KUBVA fontime GROUP NE“Gore” ) b pa.”Gore”=b. "Gore";

Tafura-1: Mibvunzo inoshandiswa mubhenji

Query executions

Heano mhedzisiro yeimwe neimwe yemubvunzo kana ichiitwa mune akasiyana dhatabhesi marongero: PostgreSQL ine uye isina indexes, yemuno ClickHouse uye clickhousedb_fdw. Nguva inoratidzwa mumamilliseconds.

Q#
PostgreSQL
PostgreSQL (yakanyorwa)
DzvanyaImba
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Tafura-1: Nguva inotorwa kuita mibvunzo inoshandiswa mubhenji

Ona zvawanikwa

Girafu inoratidza nguva yekuita mubvunzo mumamilliseconds, X axis inoratidza nhamba yemubvunzo kubva pamatafura ari pamusoro, uye Y axis inoratidza nguva yekuuraya mumamilliseconds. ClickHouse mhedzisiro uye data rakadzoserwa kubva kupostgres uchishandisa clickhousedb_fdw inoratidzwa. Kubva patafura unogona kuona kuti pane musiyano mukuru pakati pePostgreSQL neClickHouse, asi musiyano mudiki pakati peClickHouse uye clickhousedb_fdw.

Kuedza kuita kwemibvunzo yekuongorora muPostgreSQL, ClickHouse uye clickhousedb_fdw (PostgreSQL)

Iyi girafu inoratidza musiyano pakati peClickhouseDB uye clickhousedb_fdw. Mumibvunzo mizhinji, iyo FDW pamusoro haina kukwirira uye haina kukosha kunze kweQ12. Mubvunzo uyu unosanganisira majoini uye ORDER BY clause. Nekuda kweORDER BY GROUP/BY clause, ORDER BY haidonhedzi pasi kuClickHouse.

MuTafura 2 tinoona nguva ichisvetuka mumibvunzo Q12 uye Q13. Zvakare, izvi zvinokonzerwa neORDER BY clause. Kuti ndisimbise izvi, ndakabvunza mibvunzo Q-14 neQ-15 ine uye pasina ORDER BY clause. Pasina ORDER BY clause nguva yekupedzisa i 259ms uye ne ORDER BY clause iri 1364212. Kugadzirisa mubvunzo uyu ndiri kutsanangura zvose zvabvunzwa uye hezvinoi mhinduro dzetsanangudzo.

Q15: Pasina ORDER BY Clause

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Mubvunzo Pasina ORDER BY Clause

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Mubvunzo neORDER BY Clause

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Chirongwa chemubvunzo neORDER BY Clause

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

mhedziso

Mhedzisiro yezviyedzo izvi inoratidza kuti ClickHouse inopa kuita kwakanaka chaizvo, uye clickhousedb_fdw inopa mabhenefiti ekuita kweClickHouse kubva kuPostgreSQL. Kunyangwe paine kumwe pamusoro paunenge uchishandisa clickhousedb_fdw, haina basa uye inofananidzwa nekuita kunowanikwa nekumhanya natively paClickHouse dhatabhesi. Izvi zvinosimbisawo kuti fdw muPostgreSQL inopa mhedzisiro yakanaka.

Teregiramu chat kuburikidza neClickhouse https://t.me/clickhouse_ru
Telegraph chat uchishandisa PostgreSQL https://t.me/pgsql

Source: www.habr.com

Voeg