Ho leka ts'ebetso ea lipotso tsa tlhahlobo ho PostgreSQL, ClickHouse le clickhousedb_fdw (PostgreSQL)

Thutong ena, ke ne ke batla ho bona hore na ke lintlafatso life tse ka finyelloang ka ho sebelisa mohloli oa data oa ClickHouse ho e-na le PostgreSQL. Ke tseba melemo ea tlhahiso eo ke e fumanang ka ho sebelisa ClickHouse. Na melemo ee e tla tsoela pele haeba ke fihlella ClickHouse ho tsoa PostgreSQL ke sebelisa Foreign Data Wrapper (FDW)?

Litikoloho tsa database tse ithutoang ke PostgreSQL v11, clickhousedb_fdw le ClickHouse database. Qetellong, ho tloha PostgreSQL v11 re tla be re ntse re botsa lipotso tse fapaneng tsa SQL tse tsamaisoang ka clickhousedb_fdw ea rona ho database ea ClickHouse. Joale re tla bona hore na ts'ebetso ea FDW e bapisoa joang le lipotso tse tšoanang tse hlahang ho PostgreSQL ea lehae le ClickHouse ea lehae.

Clickhouse Database

ClickHouse ke mohloli o bulehileng oa tsamaiso ea li-database tsa tsamaiso e ka finyellang ts'ebetso ka makhetlo a 100-1000 ka potlako ho feta mekhoa ea setso ea database, e khonang ho sebetsana le mela e fetang limilione tse likete ka nako e ka tlaase ho motsotsoana.

Clickhousedb_fdw

clickhousedb_fdw - Sekoahelo sa data sa kantle bakeng sa database ea ClickHouse, kapa FDW, ke projeke ea mohloli o bulehileng ho tsoa Percona. Mona ke sehokelo sa polokelo ea GitHub ea projeke.

Ka Hlakubele ke ngotse blog e u bolellang haholoanyane ka FDW ea rona.

Joalokaha u tla bona, sena se fana ka FDW bakeng sa ClickHouse e lumellang SELECT ho tloha, 'me INSERT INTO, database ea ClickHouse ho tswa ho seva sa PostgreSQL v11.

FDW e ts'ehetsa likarolo tse tsoetseng pele joalo ka ho kopanya le ho kopanya. Sena se ntlafatsa haholo ts'ebetso ka ho sebelisa lisebelisoa tsa seva e hole bakeng sa ts'ebetso ena e matla haholo.

Tikoloho ea benchmark

  • Seva ea Supermicro:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 likhoele
    • Memori: 256GB ea RAM
    • Polokelo: Samsung SM863 1.9TB Enterprise SSD
    • Sistimi ea faele: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: mofuta oa 11

Liteko tsa benchmark

Sebakeng sa ho sebelisa data e entsoeng ke mochini bakeng sa tlhahlobo ena, re sebelisitse "Productivity by Time Reported Operator Time" ho tloha ka 1987 ho isa 2018. O ka fihlella data sebelisa script ea rona e fumanehang mona.

Saese ea database ke 85 GB, e fana ka tafole e le 'ngoe ea likholomo tse 109.

Lipotso tsa Benchmark

Mona ke lipotso tseo ke neng ke li bapisa ClickHouse, clickhousedb_fdw le PostgreSQL.

Q#
Potso e na le Aggregates le Group By

Q1
KHETHA DayOfWeek, bala(*) JOALOKAHA c HO TLOHA ka nako MOKAE Year >= 2000 LE Selemo <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q2
KHETHA DayOfWeek, bala(*) JOALOKAHA c HO TLOHA ka nako MOO DepDelay>10 LE Selemo >= 2000 LE Selemo <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q3
KHETHA Tšimoloho, bala(*) JOALOKAHA c HO TLOHA ka nako MOO DepDelay>10 LE Selemo>= 2000 LE Selemo <= 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;

Q4
KHETHA Mojali, bala() HO TLOHA ka nako MOO DepDelay>10 LE Selemo = 2007 GROUP BY Carrier ORDER BY count() DESC;

Q5
KHETHA a.Mojari, c, c2, c1000/c2 joalo ka c3 HO TSOA ( KHETHA Mojali, bala() JOALOKAHA c HO TLOHA ka nako MOO DepDelay>10 LE Year=2007 GROUP BY Carrier ) A INNER JOIN ( KHETHA Carrier,bala(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b. ORDER ORDER KA c3 DESC;

Q6
KHETHA a.Mojari, c, c2, c1000/c2 joalo ka c3 HO TSOA ( KHETHA Mojali, bala() JOALOKAHA c HO TLOHA ka nako MOO DepDelay>10 LE Selemo >= 2000 LE Selemo <= 2008 GROUP BY Carrier) A INNER JOIN ( KHETHA Carrier, count(*) AS c2 HO TLOHA ka nako WHERE Year >= 2000 LE Year <= 2008 GROUP BY Mojari ) b ho a.Mojari=b.TAELO YA Mojari BY c3 DESC;

Q7
KHETHA Carrier, avg(DepDelay) * 1000 AS c3 HO TLOHA ka nako HOKAE Selemo >= 2000 LE Selemo <= 2008 GROUP BY Carrier;

Q8
KHETHA Selemo, avg(DepDelay) HO TLOHA ka nako GROUP BY Year;

Q9
khetha Selemo, bala(*) joalo ka c1 ho tloha sehlopheng sa nako ka selemo;

Q10
KHETHA avg(cnt) HO TSOA (KHETHA Selemo,Khoeli,bala(*) JOALOKAHA cnt HO TLOHA ka nako MOO DepDel15=1 SEHLOPHA KA Selemo,Khoeli) a;

Q11
khetha avg(c1) ho tloha ho (khetha Year, Month,count(*) e le c1 ho tloha sehlopheng sa nako ka Selemo,Khoeli) a;

Q12
KHETHA OriginCityName, DestCityName, count(*) AS c HO TLOHA ka nako GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10;

Q13
KHETHA OriginCityName, bala(*) AS c HO TLOHA ka nako GROUP BY OriginCityName ORDER BY c DESC LIMIT 10;

Potso e na le Likopano

Q14
KHETHA a.Year, c1/c2 HO TLOHA ( kgetha Selemo, bala()1000 e le c1 ho tloha ka nako MOO DepDelay>10 GROUP BY Year) KOPANO YA KA HARE (khetha Selemo, bala(*) e le c2 ho tloha ka nako GROUP BY Year ) b ka.Selemo=b.Selemo sa TAOLO KA.Selemo;

Q15
KHETHA "Selemo", c1/c2 HO TSOA ( khetha "Selemo", bala()1000 e le c1 HO TLOHA fontime WHERE “DepDelay”>10 GROUP BY “Year”) INNER JOIN (khetha “Year”, count(*) as c2 HO TSOA fontime GROUP BY “Year” ) b on a.”Year”=b. "Selemo";

Letlapa-1: Lipotso tse sebelisitsoeng ho benchmark

Lits'ebetso tsa lipotso

Mona ke liphetho tsa potso e 'ngoe le e 'ngoe ha e tsamaisoa ka litlhophiso tse fapaneng tsa database: PostgreSQL e nang le li-index, ntle le li-index, ClickHouse ea tlhaho le clickhousedb_fdw. Nako e bonts'oa ka milliseconds.

Q#
PostgreSQL
PostgreSQL (Indexed)
TlanyaHouse
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Letlapa-1: Nako e nkuoang ho phethahatsa lipotso tse sebelisitsoeng ho benchmark

Sheba liphetho

Kerafo e bonts'a nako ea ts'ebetso ea potso ka milliseconds, axis ea X e bonts'a nomoro ea potso ho tsoa litafoleng tse kaholimo, 'me axis ea Y e bonts'a nako ea ts'ebetso ka milliseconds. Liphetho tsa ClickHouse le data e nkiloe ho li-postgres u sebelisa clickhousedb_fdw lia bontšoa. Ho tloha tafoleng u ka bona hore ho na le phapang e kholo pakeng tsa PostgreSQL le ClickHouse, empa phapang e nyenyane pakeng tsa ClickHouse le clickhousedb_fdw.

Ho leka ts'ebetso ea lipotso tsa tlhahlobo ho PostgreSQL, ClickHouse le clickhousedb_fdw (PostgreSQL)

Kerafo ena e bonts'a phapang lipakeng tsa ClickhouseDB le clickhousedb_fdw. Lipotsong tse ngata, hlooho ea FDW ha e phahame hakaalo ebile ha e bohlokoa hakaalo ntle le Q12. Potso ena e kenyelletsa mahokelo le ORDER BY clause. Ka lebaka la ORDER BY GROUP/BY clause, ORDER BY ha e theohele ho ClickHouse.

Ho Lethathamo la 2 re bona nako e qhomela lipotsong tsa Q12 le Q13. Hape, sena se bakoa ke ORDER BY clause. Ho netefatsa sena, ke ile ka botsa lipotso Q-14 le Q-15 ntle le ORDER BY clause. Ntle le ORDER BY clause nako ea ho qetela ke 259ms 'me ka ORDER BY clause ke 1364212. Ho lokisa potso ena ke hlalosa lipotso ka bobeli 'me mona ke liphetho tsa tlhaloso.

Q15: Ntle ho TAELO BY Clause

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Potso Ntle le TAELO BY Clause

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Potso Ka TAELO BY Clause

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Potso Plan with ORDER BY Clause

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

fihlela qeto e

Liphetho tsa liteko tsena li bonts'a hore ClickHouse e fana ka ts'ebetso e ntle haholo, mme clickhousedb_fdw e fana ka melemo ea ts'ebetso ea ClickHouse ho tsoa ho PostgreSQL. Le hoja ho na le holimo ha u sebelisa clickhousedb_fdw, ha e na thuso ebile e ka bapisoa le ts'ebetso e fihletsoeng ka ho sebetsa ka tlhaho ho database ea ClickHouse. Sena se boetse se tiisa hore fdw ho PostgreSQL e fana ka liphetho tse ntle haholo.

Puisano ea thelekramo ka Clickhouse https://t.me/clickhouse_ru
Puisano ea Telegraph e sebelisa PostgreSQL https://t.me/pgsql

Source: www.habr.com

Eketsa ka tlhaloso