Daraasaddan, waxaan rabay inaan arko waxa hagaajinta waxqabadka lagu gaari karo iyadoo la adeegsanayo isha xogta ClickHouse halkii aan ka isticmaali lahaa PostgreSQL. Waan ogahay faa'iidooyinka wax soo saarka ee aan ka helo isticmaalka ClickHouse. Faa'iidooyinkani miyay sii socon doonaan haddii aan ka galo ClickHouse ee PostgreSQL anigoo isticmaalaya Xogta Shisheeye (FDW)?
Goobaha xogta ee la bartay waa PostgreSQL v11, clickhousedb_fdw iyo ClickHouse database. Ugu dambeyntii, laga bilaabo PostgreSQL v11 waxaan ku socodsiin doonaa weydiimo kala duwan oo SQL ah iyada oo loo marayo clickhousedb_fdw ilaa keydka ClickHouse. Waxaan markaas arki doonaa sida waxqabadka FDW uu la barbar dhigo isla weydiimaha ka socda PostgreSQL iyo ClickHouse.
Clickhouse Database
ClickHouse waa nidaamka maaraynta xogta tiirarka isha furan kaas oo gaadhi kara waxqabad 100-1000 jeer ka dhakhso badan hababka kaydinta caadiga ah, oo awood u leh in uu ka shaqeeyo in ka badan bilyan saf in ka yar ilbiriqsi.
Clickhousedb_fdw
clickhousedb_fdw - Duubabka xogta dibadda ee kaydka ClickHouse, ama FDW, waa mashruuc il furan oo ka yimid Percona.
Sidaad arki doonto, tani waxay siinaysaa FDW ClickHouse oo u ogolaanaya SELECT ka, iyo INSERT INTO, xogta ClickHouse ee server-ka PostgreSQL v11.
FDW waxa ay taageertaa sifada horumarsan sida isu geynta iyo ku biirista. Tani waxay si weyn u wanaajisaa waxqabadka iyadoo u adeegsanaysa agabka server-ka fog ee hawlgalladan kheyraadka leh.
Deegaanka Benchmark
- Server-ka supermicro:
- Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
- 2 saldhig / 28 koofiyado / 56 dun
- Xusuusta: 256GB ee RAM
- Kaydinta: Samsung SM863 1.9TB Enterprise SSD
- Nidaamka faylka: ext4/xfs
- OS: Linux smblade01 4.15.0-42-guud #45~16.04.1-Ubuntu
- PostgreSQL: nooca 11
Imtixaanada bartilmaameedka
Halkii aan u isticmaali lahayn qaar ka mid ah xogta mishiinku soo saartay ee tijaabadan, waxaanu isticmaalnay xogta "Waqtiga Hawl-wadeenka Wakhtiga La Sheegay" Waxsoosaarka laga bilaabo 1987 ilaa 2018. Waxaad heli kartaa xogta
Cabbirka kaydka xogta waa 85 GB, taasoo bixisa hal miis oo ka kooban 109 tiir.
Su'aalaha Benchmark
Waa kuwan su'aalihii aan is barbardhigay ClickHouse, clickhousedb_fdw iyo PostgreSQL.
Q#
Weydiinta waxay ka kooban tahay wadar ahaan iyo koox ahaan
Q1
Xulo Maalinta Todobaadka, tiri(*) SIDA c wakhtiga loogu talagalay halkee sanadka>= 2000 iyo sanadka <= 2008 KOOXDA DALALKA DAYO Todobaadka ee c DESC;
Q2
Xulo Maalinta Todobaadka, tiri (*) SIDA c wakhtiga saxda ah HALKEE Dib udhigid>10 IYO Sannad>= 2000 IYO Sannad <= 2008 Kooxda XNUMX ee Dalabka Maalinta Todobaadka ee c DESC;
Q3
Xulo Asalka, tiriso (*) SIDA c laga soo bilaabo waqtigii loogu tala galay halka DepDelay>10 IYO Year>= 2000 iyo sanadka <= 2008 GROUP BY Asal ahaan Amarka by c DESC LIMIT 10;
Q4
Xulo qaade, tiriLaga bilaabo wakhtiga saxda ah HALKEE DepDelay>10 IYO Sannad = 2007 KOOXDA DALALKA KOOXDA) DESC;
Q5
Xulo a. Qaade, c, c2, c1000/c2 sida c3 FROM (Xul qaadaha, tirintaSida c laga bilaabo wakhtiga wakhtiga DepDelay>10 IYO Year=2007 GROUP BY Carrier) Ku biir GUDAHA (Xul Qaade, tiri(*) Sida c2 wakhtiga saxda ah WHERE Year=2007 GROUP BY Carrier) b on a. Carrier=b.Amarka Qaade BY c3 DESC;
Q6
Xulo a. Qaade, c, c2, c1000/c2 sida c3 FROM (Xul qaadaha, tirintaSida c laga soo bilaabo waqtigii loogu talagalay halkee DepDelay>10 iyo Sannad>= 2000 iyo Sannad <= 2008 KOOXDA side ah) Ku biir GUDAHA (Xul qaado qaade, tiri(*) AS c2 laga bilaabo wakhtigeedii Halkee Sannad>= 2000 iyo Sannad <= 2008 KOOXDA Qaade) b dul saaran a. Qaade=b.Amarka Qaade ee c3 DESC;
Q7
Dooro Qaade, celcelis ahaan (DepDelay) * 1000 AS c3 wakhtiga loogu talagalay Halkee Sannad>= 2000 iyo Sannadka <= 2008 KOOXDA Qaade;
Q8
DOORASHADA Sannadka, celcelis ahaan (DepDelay) KOOXDA wakhtiga loogu talagalay sannadkii;
Q9
dooro Sannadka, ku tiriso (*) sida c1 koox ahaan sannadkiiba;
Q10
U DOOR celcelis ahaan (cnt) laga soo bilaabo (Sanadka Xulashada, Bisha, tirinta(*) Sida cnt laga soo bilaabo wakhtiga loogu talagalay HALKEE DepDel15=1 KOOXDA Sannadka, Bisha) a;
Q11
ka dooro celceliska (c1) laga bilaabo (dooro Sannadka, Bisha, tirinta(*) sida c1 kooxda wakhtiga loogu talagalay sanadka,Bisha) a;
Q12
DOORO OriginCityMagaca, Magaca DestCity, tiriso(*) SIDA KOOXDA WAQTIGA AH EE OriginCityName, Magaca DestCity Magaca AMARKA C DESC LIMIT 10;
Q13
U Xulo Magaca asalka ah, tiri (*) SID KOOXDA WAQTIGA AH KOOXDA MAGACA ASALKA AH AMARKA C DESC LIMIT 10;
Weydiinta waxay ka kooban tahay ku biirista
Q14
KA XUL A. Year, c1/c2 laga bilaabo (dooro Sannad, tiri)1000 sida c1 laga bilaabo waqtigii loogu talagalay HALKEE DepDelay>10 KOOXDA Sannadka) Ku biir GUDAHA (dooro Sannadka, xisaabinta (*) sida c2 laga bilaabo wakhtiga loogu talagalay GROUP BY sanadka) b on a.Year=b.Amarka Sannadka ee sanadka;
Q15
Dooro a."Year", c1/c2 FROM (dooro "Sannad", tiri)1000 as c1 laga soo bilaabo fonttime HALKEE “DepDelay”>10 KOOXDA “Sanadka”) Ku biir GUDAHA (dooro “Sannad”, tiri(*) sida c2 FROM fontime GROUP BY “Sannad”) b on a.” Year”=b. "Sannad";
Shaxda-1: Weydiimaha loo adeegsaday bartilmaameedka
Su'aal dil
Waa kuwan natiijooyinka mid kasta oo ka mid ah weydiimaha marka lagu shaqeeyo goobaha xogta ee kala duwan: PostgreSQL oo leh iyo la'aan tusiyaal, ClickHouse hooyo iyo clickhousedb_fdw. Wakhtiga waxa lagu muujiyaa millise seconds.
Q#
PostgreSQL
PostgreSQL (la tilmaamay)
GujiHouse
clickhousedb_fdw
Q1
27920
19634
23
57
Q2
35124
17301
50
80
Q3
34046
15618
67
115
Q4
31632
7667
25
37
Q5
47220
8976
27
60
Q6
58233
24368
55
153
Q7
30566
13256
52
91
Q8
38309
60511
112
179
Q9
20674
37979
31
81
Q10
34990
20102
56
148
Q11
30489
51658
37
155
Q12
39357
33742
186
1333
Q13
29912
30709
101
384
Q14
54126
39913
124
1364212
Q15
97258
30211
245
259
Shaxda-1: Waqtiga la qaatay si loo fuliyo weydiimaha loo adeegsaday bartilmaameedka
Daawo natiijooyinka
Jaantusku waxa uu ku tusayaa wakhtiga fulinta weydiinta millise seconds, dhidibka X waxa uu ku tusayaa nambarka waydiinta jaantusyada sare, iyo dhidibka Y waxa uu ku tusayaa wakhtiga fulinta millise seconds. Natiijooyinka ClickHouse iyo xogta laga helay postgres iyadoo la isticmaalayo clickhousedb_fdw ayaa la muujiyay. Jadwalka waxaad ka arki kartaa inuu jiro farqi weyn oo u dhexeeya PostgreSQL iyo ClickHouse, laakiin faraqa ugu yar ee u dhexeeya ClickHouse iyo clickhousedb_fdw.
Shaxdani waxay muujinaysaa faraqa u dhexeeya ClickhouseDB iyo clickhousedb_fdw. Weydiimaha intooda badan, FDW-da sare maaha mid sidaas u sarreeya oo si dhib yar muhiim uma aha marka laga reebo Q12. Weydiintan waxaa ku jira ku biirista iyo AMARKA QODOBKA. Sababtoo ah Amarka KOOXDA/Qodobka, Amarka BY hoos uma dhigayo ClickHouse.
Jadwalka 2 waxaan ku aragnaa wakhtiga la boodayo ee su'aalaha Q12 iyo Q13. Mar labaad, tan waxaa sababay Amarka QODOBKA. Si aan taas u xaqiijiyo, waxaan la orday su'aalaha Q-14 iyo Q-15 oo wata ama la'aanteed Amarka QODOBKA. Amarka faqradda la'aanteed wakhtiga dhammayntu waa 259ms halka Amarka faqraduhuna yahay 1364212. Si loo saxo su'aashan waxaan sharaxayaa labada weydiimood oo halkan waa natiijooyinka sharaxaadda.
Q15: Amarka La'aanteed
bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2
FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";
Q15: Weydiimo Amarka La'aanteed
QUERY PLAN
Hash Join (cost=2250.00..128516.06 rows=50000000 width=12)
Output: fontime."Year", (((count(*) * 1000)) / b.c2)
Inner Unique: true Hash Cond: (fontime."Year" = b."Year")
-> Foreign Scan (cost=1.00..-1.00 rows=100000 width=12)
Output: fontime."Year", ((count(*) * 1000))
Relations: Aggregate on (fontime)
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"
-> Hash (cost=999.00..999.00 rows=100000 width=12)
Output: b.c2, b."Year"
-> Subquery Scan on b (cost=1.00..999.00 rows=100000 width=12)
Output: b.c2, b."Year"
-> Foreign Scan (cost=1.00..-1.00 rows=100000 width=12)
Output: fontime_1."Year", (count(*))
Relations: Aggregate on (fontime)
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)
Q14: Weydiin leh AMARKA faqradda
bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b ON a."Year"= b."Year"
ORDER BY a."Year";
Q14: Qorshaha weydiinta oo leh AMARKA faqradda
QUERY PLAN
Merge Join (cost=2.00..628498.02 rows=50000000 width=12)
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))
Inner Unique: true Merge Cond: (fontime."Year" = fontime_1."Year")
-> GroupAggregate (cost=1.00..499.01 rows=1 width=12)
Output: fontime."Year", (count(*) * 1000)
Group Key: fontime."Year"
-> Foreign Scan on public.fontime (cost=1.00..-1.00 rows=100000 width=4)
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10))
ORDER BY "Year" ASC
-> GroupAggregate (cost=1.00..499.01 rows=1 width=12)
Output: fontime_1."Year", count(*) Group Key: fontime_1."Year"
-> Foreign Scan on public.fontime fontime_1 (cost=1.00..-1.00 rows=100000 width=4)
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)
gunaanad
Natiijooyinka tijaabooyinkan waxay muujinayaan in ClickHouse ay bixiso waxqabad aad u wanaagsan, iyo clickhousedb_fdw waxay bixisaa faa'iidooyinka waxqabadka ClickHouse ee PostgreSQL. In kasta oo ay jiraan xoogaa dulsaar ah marka la isticmaalayo clickhousedb_fdw, waa mid dayacan oo la barbar dhigi karo waxqabadka lagu gaaray adigoo si asal ah ugu ordaya keydka ClickHouse. Tani waxay sidoo kale xaqiijineysaa in fdw ee PostgreSQL ay bixiso natiijooyin aad u fiican.
Ku sheekeysi telegram adigoo isticmaalaya Clickhouse
Ku sheekeysi telegram adoo isticmaalaya PostgreSQL
Source: www.habr.com