Tijaabinta waxqabadka weydiimaha falanqaynta ee PostgreSQL, ClickHouse iyo clickhousedb_fdw (PostgreSQL)

Daraasaddan, waxaan rabay inaan arko waxa hagaajinta waxqabadka lagu gaari karo iyadoo la adeegsanayo isha xogta ClickHouse halkii aan ka isticmaali lahaa PostgreSQL. Waan ogahay faa'iidooyinka wax soo saarka ee aan ka helo isticmaalka ClickHouse. Faa'iidooyinkani miyay sii socon doonaan haddii aan ka galo ClickHouse ee PostgreSQL anigoo isticmaalaya Xogta Shisheeye (FDW)?

Goobaha xogta ee la bartay waa PostgreSQL v11, clickhousedb_fdw iyo ClickHouse database. Ugu dambeyntii, laga bilaabo PostgreSQL v11 waxaan ku socodsiin doonaa weydiimo kala duwan oo SQL ah iyada oo loo marayo clickhousedb_fdw ilaa keydka ClickHouse. Waxaan markaas arki doonaa sida waxqabadka FDW uu la barbar dhigo isla weydiimaha ka socda PostgreSQL iyo ClickHouse.

Clickhouse Database

ClickHouse waa nidaamka maaraynta xogta tiirarka isha furan kaas oo gaadhi kara waxqabad 100-1000 jeer ka dhakhso badan hababka kaydinta caadiga ah, oo awood u leh in uu ka shaqeeyo in ka badan bilyan saf in ka yar ilbiriqsi.

Clickhousedb_fdw

clickhousedb_fdw - Duubabka xogta dibadda ee kaydka ClickHouse, ama FDW, waa mashruuc il furan oo ka yimid Percona. Halkan waxaa ah isku xirka kaydka GitHub ee mashruuca.

Bishii Maarso waxaan qoray blog wax badan kaaga sheegaya FDW-keena.

Sidaad arki doonto, tani waxay siinaysaa FDW ClickHouse oo u ogolaanaya SELECT ka, iyo INSERT INTO, xogta ClickHouse ee server-ka PostgreSQL v11.

FDW waxa ay taageertaa sifada horumarsan sida isu geynta iyo ku biirista. Tani waxay si weyn u wanaajisaa waxqabadka iyadoo u adeegsanaysa agabka server-ka fog ee hawlgalladan kheyraadka leh.

Deegaanka Benchmark

  • Server-ka supermicro:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 saldhig / 28 koofiyado / 56 dun
    • Xusuusta: 256GB ee RAM
    • Kaydinta: Samsung SM863 1.9TB Enterprise SSD
    • Nidaamka faylka: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-guud #45~16.04.1-Ubuntu
  • PostgreSQL: nooca 11

Imtixaanada bartilmaameedka

Halkii aan u isticmaali lahayn qaar ka mid ah xogta mishiinku soo saartay ee tijaabadan, waxaanu isticmaalnay xogta "Waqtiga Hawl-wadeenka Wakhtiga La Sheegay" Waxsoosaarka laga bilaabo 1987 ilaa 2018. Waxaad heli kartaa xogta iyadoo la adeegsanayo qoraalkayaga halkan laga heli karo.

Cabbirka kaydka xogta waa 85 GB, taasoo bixisa hal miis oo ka kooban 109 tiir.

Su'aalaha Benchmark

Waa kuwan su'aalihii aan is barbardhigay ClickHouse, clickhousedb_fdw iyo PostgreSQL.

Q#
Weydiinta waxay ka kooban tahay wadar ahaan iyo koox ahaan

Q1
Xulo Maalinta Todobaadka, tiri(*) SIDA c wakhtiga loogu talagalay halkee sanadka>= 2000 iyo sanadka <= 2008 KOOXDA DALALKA DAYO Todobaadka ee c DESC;

Q2
Xulo Maalinta Todobaadka, tiri (*) SIDA c wakhtiga saxda ah HALKEE Dib udhigid>10 IYO Sannad>= 2000 IYO Sannad <= 2008 Kooxda XNUMX ee Dalabka Maalinta Todobaadka ee c DESC;

Q3
Xulo Asalka, tiriso (*) SIDA c laga soo bilaabo waqtigii loogu tala galay halka DepDelay>10 IYO Year>= 2000 iyo sanadka <= 2008 GROUP BY Asal ahaan Amarka by c DESC LIMIT 10;

Q4
Xulo qaade, tiriLaga bilaabo wakhtiga saxda ah HALKEE DepDelay>10 IYO Sannad = 2007 KOOXDA DALALKA KOOXDA) DESC;

Q5
Xulo a. Qaade, c, c2, c1000/c2 sida c3 FROM (Xul qaadaha, tirintaSida c laga bilaabo wakhtiga wakhtiga DepDelay>10 IYO Year=2007 GROUP BY Carrier) Ku biir GUDAHA (Xul Qaade, tiri(*) Sida c2 wakhtiga saxda ah WHERE Year=2007 GROUP BY Carrier) b on a. Carrier=b.Amarka Qaade BY c3 DESC;

Q6
Xulo a. Qaade, c, c2, c1000/c2 sida c3 FROM (Xul qaadaha, tirintaSida c laga soo bilaabo waqtigii loogu talagalay halkee DepDelay>10 iyo Sannad>= 2000 iyo Sannad <= 2008 KOOXDA side ah) Ku biir GUDAHA (Xul qaado qaade, tiri(*) AS c2 laga bilaabo wakhtigeedii Halkee Sannad>= 2000 iyo Sannad <= 2008 KOOXDA Qaade) b dul saaran a. Qaade=b.Amarka Qaade ee c3 DESC;

Q7
Dooro Qaade, celcelis ahaan (DepDelay) * 1000 AS c3 wakhtiga loogu talagalay Halkee Sannad>= 2000 iyo Sannadka <= 2008 KOOXDA Qaade;

Q8
DOORASHADA Sannadka, celcelis ahaan (DepDelay) KOOXDA wakhtiga loogu talagalay sannadkii;

Q9
dooro Sannadka, ku tiriso (*) sida c1 koox ahaan sannadkiiba;

Q10
U DOOR celcelis ahaan (cnt) laga soo bilaabo (Sanadka Xulashada, Bisha, tirinta(*) Sida cnt laga soo bilaabo wakhtiga loogu talagalay HALKEE DepDel15=1 KOOXDA Sannadka, Bisha) a;

Q11
ka dooro celceliska (c1) laga bilaabo (dooro Sannadka, Bisha, tirinta(*) sida c1 kooxda wakhtiga loogu talagalay sanadka,Bisha) a;

Q12
DOORO OriginCityMagaca, Magaca DestCity, tiriso(*) SIDA KOOXDA WAQTIGA AH EE OriginCityName, Magaca DestCity Magaca AMARKA C DESC LIMIT 10;

Q13
U Xulo Magaca asalka ah, tiri (*) SID KOOXDA WAQTIGA AH KOOXDA MAGACA ASALKA AH AMARKA C DESC LIMIT 10;

Weydiinta waxay ka kooban tahay ku biirista

Q14
KA XUL A. Year, c1/c2 laga bilaabo (dooro Sannad, tiri)1000 sida c1 laga bilaabo waqtigii loogu talagalay HALKEE DepDelay>10 KOOXDA Sannadka) Ku biir GUDAHA (dooro Sannadka, xisaabinta (*) sida c2 laga bilaabo wakhtiga loogu talagalay GROUP BY sanadka) b on a.Year=b.Amarka Sannadka ee sanadka;

Q15
Dooro a."Year", c1/c2 FROM (dooro "Sannad", tiri)1000 as c1 laga soo bilaabo fonttime HALKEE “DepDelay”>10 KOOXDA “Sanadka”) Ku biir GUDAHA (dooro “Sannad”, tiri(*) sida c2 FROM fontime GROUP BY “Sannad”) b on a.” Year”=b. "Sannad";

Shaxda-1: Weydiimaha loo adeegsaday bartilmaameedka

Su'aal dil

Waa kuwan natiijooyinka mid kasta oo ka mid ah weydiimaha marka lagu shaqeeyo goobaha xogta ee kala duwan: PostgreSQL oo leh iyo la'aan tusiyaal, ClickHouse hooyo iyo clickhousedb_fdw. Wakhtiga waxa lagu muujiyaa millise seconds.

Q#
PostgreSQL
PostgreSQL (la tilmaamay)
GujiHouse
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Shaxda-1: Waqtiga la qaatay si loo fuliyo weydiimaha loo adeegsaday bartilmaameedka

Daawo natiijooyinka

Jaantusku waxa uu ku tusayaa wakhtiga fulinta weydiinta millise seconds, dhidibka X waxa uu ku tusayaa nambarka waydiinta jaantusyada sare, iyo dhidibka Y waxa uu ku tusayaa wakhtiga fulinta millise seconds. Natiijooyinka ClickHouse iyo xogta laga helay postgres iyadoo la isticmaalayo clickhousedb_fdw ayaa la muujiyay. Jadwalka waxaad ka arki kartaa inuu jiro farqi weyn oo u dhexeeya PostgreSQL iyo ClickHouse, laakiin faraqa ugu yar ee u dhexeeya ClickHouse iyo clickhousedb_fdw.

Tijaabinta waxqabadka weydiimaha falanqaynta ee PostgreSQL, ClickHouse iyo clickhousedb_fdw (PostgreSQL)

Shaxdani waxay muujinaysaa faraqa u dhexeeya ClickhouseDB iyo clickhousedb_fdw. Weydiimaha intooda badan, FDW-da sare maaha mid sidaas u sarreeya oo si dhib yar muhiim uma aha marka laga reebo Q12. Weydiintan waxaa ku jira ku biirista iyo AMARKA QODOBKA. Sababtoo ah Amarka KOOXDA/Qodobka, Amarka BY hoos uma dhigayo ClickHouse.

Jadwalka 2 waxaan ku aragnaa wakhtiga la boodayo ee su'aalaha Q12 iyo Q13. Mar labaad, tan waxaa sababay Amarka QODOBKA. Si aan taas u xaqiijiyo, waxaan la orday su'aalaha Q-14 iyo Q-15 oo wata ama la'aanteed Amarka QODOBKA. Amarka faqradda la'aanteed wakhtiga dhammayntu waa 259ms halka Amarka faqraduhuna yahay 1364212. Si loo saxo su'aashan waxaan sharaxayaa labada weydiimood oo halkan waa natiijooyinka sharaxaadda.

Q15: Amarka La'aanteed

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Weydiimo Amarka La'aanteed

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Weydiin leh AMARKA faqradda

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Qorshaha weydiinta oo leh AMARKA faqradda

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

gunaanad

Natiijooyinka tijaabooyinkan waxay muujinayaan in ClickHouse ay bixiso waxqabad aad u wanaagsan, iyo clickhousedb_fdw waxay bixisaa faa'iidooyinka waxqabadka ClickHouse ee PostgreSQL. In kasta oo ay jiraan xoogaa dulsaar ah marka la isticmaalayo clickhousedb_fdw, waa mid dayacan oo la barbar dhigi karo waxqabadka lagu gaaray adigoo si asal ah ugu ordaya keydka ClickHouse. Tani waxay sidoo kale xaqiijineysaa in fdw ee PostgreSQL ay bixiso natiijooyin aad u fiican.

Ku sheekeysi telegram adigoo isticmaalaya Clickhouse https://t.me/clickhouse_ru
Ku sheekeysi telegram adoo isticmaalaya PostgreSQL https://t.me/pgsql

Source: www.habr.com

Add a comment