Nguji kinerja pitakon analitis ing PostgreSQL, ClickHouse lan clickhousedb_fdw (PostgreSQL)

Ing panliten iki, aku kepengin weruh apa perbaikan kinerja sing bisa ditindakake kanthi nggunakake sumber data ClickHouse tinimbang PostgreSQL. Aku ngerti keuntungan produktivitas sing aku entuk saka nggunakake ClickHouse. Apa keuntungan kasebut bakal terus yen aku ngakses ClickHouse saka PostgreSQL nggunakake Wrapper Data Asing (FDW)?

Lingkungan database sing diteliti yaiku PostgreSQL v11, clickhousedb_fdw lan database ClickHouse. Pungkasane, saka PostgreSQL v11 kita bakal mbukak macem-macem pitakon SQL sing dituju liwat clickhousedb_fdw menyang database ClickHouse. Banjur kita bakal weruh kepiye kinerja FDW dibandhingake karo pitakon sing padha ing PostgreSQL asli lan ClickHouse asli.

Database Clickhouse

ClickHouse minangka sistem manajemen basis data kolom sumber terbuka sing bisa entuk kinerja 100-1000 kaping luwih cepet tinimbang pendekatan basis data tradisional, bisa ngolah luwih saka milyar baris kurang saka detik.

Clickhousedb_fdw

clickhousedb_fdw - Pambungkus data eksternal kanggo database ClickHouse, utawa FDW, minangka proyek sumber terbuka saka Percona. Iki minangka tautan menyang repositori GitHub proyek kasebut.

Ing wulan Maret aku nulis blog sing nyritakake babagan FDW kita.

Nalika sampeyan bakal weruh, iki nyedhiyakake FDW kanggo ClickHouse sing ngidini SELECT saka, lan INSERT INTO, database ClickHouse saka server PostgreSQL v11.

FDW ndhukung fitur canggih kayata agregat lan gabung. Iki nambah kinerja kanthi nyata kanthi nggunakake sumber daya saka server remot kanggo operasi intensif sumber daya kasebut.

Lingkungan pathokan

  • Server Supermicro:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 soket / 28 inti / 56 benang
    • Memori: 256GB RAM
    • Panyimpenan: Samsung SM863 1.9TB Enterprise SSD
    • Sistem file: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generik #45~16.04.1-Ubuntu
  • PostgreSQL: versi 11

Tes pathokan

Tinimbang nggunakake sawetara set data sing digawe mesin kanggo tes iki, kita nggunakake data "Produktivitas miturut Wektu Dilaporake Operator Wektu" saka 1987 nganti 2018. Sampeyan bisa ngakses data nggunakake script kita kasedhiya kene.

Ukuran database 85 GB, nyedhiyakake siji tabel 109 kolom.

Pitakonan Pathokan

Iki pitakon sing digunakake kanggo mbandhingake ClickHouse, clickhousedb_fdw lan PostgreSQL.

Q#
Query Ngandhut Agregat lan Grup Miturut

Q1
PILIH DayOfWeek, count(*) AS c FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q2
PILIH DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY DayOfWeek ORDER BY c DESC;

Q3
PILIH Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;

Q4
PILIH Carrier, count() FROM ontime WHERE DepDelay>10 AND Year = 2007 GROUP BY Carrier ORDER BY count() DESC;

Q5
PILIH a. Pembawa, c, c2, c1000/c2 minangka c3 FROM ( PILIH Carrier, count() AS c FROM ontime WHERE DepDelay>10 AND Year=2007 GROUP BY Carrier ) a INNER JOIN ( PILIH Carrier, count(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier)b on a.Carrier=b.Carrier ORDER BY c3 DESC;

Q6
PILIH a. Pembawa, c, c2, c1000/c2 minangka c3 FROM ( PILIH Carrier, count() AS c FROM ontime WHERE DepDelay>10 AND Year >= 2000 AND Year <= 2008 GROUP BY Carrier) a INNER JOIN ( SELECT Carrier, count(*) AS c2 FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY Carrier ) b on a.Carrier=b.Carrier ORDER BY c3 DESC;

Q7
SELECT Carrier, avg(DepDelay) * 1000 AS c3 FROM ontime WHERE Year >= 2000 AND Year <= 2008 GROUP BY Carrier;

Q8
PILIH Taun, rata-rata (DepDelay) Saka Klompok sing tepat miturut Taun;

Q9
pilih Taun, count(*) minangka c1 saka klompok ontime miturut Taun;

Q10
SELECT avg(cnt) FROM (PILIH Taun,Wulan,count(*) AS cnt FROM ontime WHERE DepDel15=1 GROUP BY Taun,Wulan) a;

Q11
pilih rata-rata (c1) saka (pilih Taun, Wulan, count (*) minangka c1 saka klompok ontime miturut Taun, Wulan) a;

Q12
PILIH OriginCityName, DestCityName, count(*) AS c FROM ontime GROUP BY OriginCityName, DestCityName ORDER BY c DESC LIMIT 10;

Q13
PILIH OriginCityName, count(*) AS c FROM ontime GROUP BY OriginCityName ORDER BY c DESC LIMIT 10;

Pitakonan Ngandhut Gabungan

Q14
PILIH a.Year, c1/c2 FROM ( pilih Year, count()1000 minangka c1 saka ontime WHERE DepDelay>10 GROUP BY Year) a INNER JOIN (pilih Taun, count(*) minangka c2 saka ontime GROUP BY Year ) b ing a.Year=b.Year ORDER BY a.Year;

Q15
PILIH a."Tahun", c1/c2 FROM ( pilih "Taun", count()1000 minangka c1 FROM fontime Where “DepDelay”>10 GROUP BY “Year”) a INNER JOIN (pilih “Year”, count(*) as c2 FROM fontime GROUP BY “Year” ) b on a.”Year”=b. "Taun";

Tabel-1: Pitakonan sing digunakake ing pathokan

Eksekusi pitakon

Mangkene asil saben pitakon nalika mbukak ing setelan basis data sing beda: PostgreSQL kanthi lan tanpa indeks, ClickHouse asli lan clickhousedb_fdw. Wektu ditampilake ing milliseconds.

Q#
PostgreSQL
PostgreSQL (Indexed)
clickhouse
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Tabel-1: Wektu sing ditindakake kanggo nglakokake pitakon sing digunakake ing pathokan

Ndeleng asil

Grafik kasebut nuduhake wektu eksekusi query ing milidetik, sumbu X nuduhake nomer pitakon saka tabel ing ndhuwur, lan sumbu Y nuduhake wektu eksekusi ing milidetik. Asil ClickHouse lan data sing dijupuk saka postgres nggunakake clickhousedb_fdw ditampilake. Saka tabel sampeyan bisa ndeleng manawa ana bedane gedhe antarane PostgreSQL lan ClickHouse, nanging bedane minimal antarane ClickHouse lan clickhousedb_fdw.

Nguji kinerja pitakon analitis ing PostgreSQL, ClickHouse lan clickhousedb_fdw (PostgreSQL)

Grafik iki nuduhake prabédan antarane ClickhouseDB lan clickhousedb_fdw. Ing pirang-pirang pitakon, overhead FDW ora dhuwur lan meh ora signifikan kajaba Q12. Pitakonan iki kalebu gabungan lan klausa ORDER BY. Amarga klausa ORDER BY GROUP / BY, ORDER BY ora mudhun menyang ClickHouse.

Ing Tabel 2, kita ndeleng lompat wektu ing pitakon Q12 lan Q13. Maneh, iki disebabake dening klausa ORDER BY. Kanggo konfirmasi iki, aku mbukak pitakon Q-14 lan Q-15 kanthi lan tanpa klausa ORDER BY. Tanpa klausa ORDER BY wektu rampung yaiku 259ms lan kanthi klausa ORDER BY yaiku 1364212. Kanggo debug pitakon iki, aku njlentrehake pitakon kasebut lan ana asil panjelasan kasebut.

Q15: Tanpa ORDER BY Klausa

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Pitakonan Tanpa ORDER BY Klausa

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Pitakonan Kanthi ORDER BY Klausa

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Rencana Pitakonan kanthi ORDER BY Clause

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

kesimpulan

Asil eksperimen kasebut nuduhake yen ClickHouse nawakake kinerja sing apik banget, lan clickhousedb_fdw nawakake keuntungan kinerja ClickHouse saka PostgreSQL. Nalika ana sawetara overhead nalika nggunakake clickhousedb_fdw, iku diabaikan lan iso dibandhingke kanggo kinerja ngrambah dening mlaku native ing database ClickHouse. Iki uga negesake manawa fdw ing PostgreSQL menehi asil sing apik banget.

Telegram chatting liwat Clickhouse https://t.me/clickhouse_ru
Telegram chatting nggunakake PostgreSQL https://t.me/pgsql

Source: www.habr.com

Add a comment