Experientia inquisitionis analyticae in PostgreSQL, ClickHouse et clickhousedb_fdw (PostgreSQL)

In hoc studio videre volui quaenam melioramenta perficiendi effici possent utendo fonte data strepita potius quam PostgreSQL. Novi fructibus beneficiis utendo ClickHouse possideo. Num haec beneficia perseverent si ego accessum de ClickHouse ex PostgreSQL utens Aliena Data Wrapper (FDW)?

Ambitus database pervestigatae sunt PostgreSQL v11, clickhousedb_fdw et strepita database. Ultimo, ex PostgreSQL v11 currentes varias SQL queries fusa per clichousedb_fdw ad databases ClickHouse. Tunc videbimus quomodo effectus FDW comparat eisdem quaestionibus currentem in indigena PostgreSQL et in patria strepita.

Clickhouse Database

ClickHouse fons apertus est ratio datorum columnaris procuratio quae perficiendi 100-1000 temporibus velocius quam traditionalis datorum aditus consequi potest, ut per decies centena millia ordines in minus quam in secundo disponi possit.

Clickhousedb_fdw

clickhousedb_fdw - Involucrum externae notitiae pro database, vel FDW, fons apertus est e Percona projectus. Hic est nexus cum repositorio GitHub exerti.

Mense Martio Scripsi blog quod plus tibi narrat de nostro FDW.

Ut visurus es, hoc praebet FDW pro ClickHouse quod permittit eligere, et inserere in, datorum electronicarum e servo PostgreSQL v11.

FDW sustinet notas provectas ut aggregatum et iungendum. Hoc significanter melius effectum est utendo facultates remotis servientis pro his operationibus intensivis ope- ris.

Probatio environment

  • Supermicro servo:
    • Intel® Xeon® CPU E5-2683 v3 @ 2.00GHz
    • 2 bases / 28 coros / 56 stamina
    • Memoria: 256GB of RAM
    • Repono: Samsung SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: versio 11

Probatio probat

Loco utentes aliquas machinae generatae notitias ad hoc experimentum appositas, "Productivity by Time Report Operator Time" data ab anno 1987 ad MMXVIII. Potes accedere ad notitia per scripturam nostram hic praesto.

Magnitudo database 85 GB est, una mensa 109 columnarum praebens.

Probatio Queries

Hic interrogationes usus sum comparare ClickHouse, premehousedb_fdw et PostgreSQL.

Q#
Query Continet aggregata et Group By

Q1
SELECT DayOfWeek, computa(*) AS c AB ONO QUO ANNO >= MM ET ANNUS <= 2000 GROUP BY DAYOfWEEK ORDINE PER c DESC;

Q2
SELECT DayOfWeek, COMITEM(*) AS c AB OLIM QUO DepDelay>10 ET ANNUS >= MM ET ANNUS <= 2000 GROUP PER DIEM ORDINIS WEEK ORDINIS PER c DESC;

Q3
SELECT ORIGINEM, COMITEM(*) AS c AB OLIM UBI DEPDelay>10 ET ANNO >= 2000 ET ANNUS <= 2008 ORDO ORDINEM AB ORIGINE ORDINEM PER c DESC 10;

Q4
SELECT Portitorem, comitem () FROM ontime WHERE DepDelay>10 ET ANNUS = MMVII GROUP BY Portitorem ORDINE COMITATUS () ASC;

Q5
SELECT a.Carrier, c, c2, c1000/c2 ut c3 FROM ( SELECT Carrier, count() AS c AB ontime WHERE DepDelay>10 ET Annus = 2007 GROUP BY Portitor ) INTERNUS JOIN (SELECT Carrier, count (*) AS c2 AB ontime WHERE Annus=2007 Group by Carrier) b on a.Carrier=b.Carrier Order BY c3 DESC;

Q6
SELECT a.Carrier, c, c2, c1000/c2 ut c3 FROM ( SELECT Carrier, count() AS c AB ontime WHERE DepDelay>10 ET ANNUS >= 2000 ET ANNUS <= 2008 GROUP BY Portitor) INTERNUS JOIN (SELECT Carrier, count(*) AS c2 AB ontime WHERE Annus >= 2000 AND ANNUS <= 2008 GROUP BY Carrier ) b on a.Carrier=b.Carrier ORDINE PER c3 DESC;

Q7
SELECT Portitorem, avg(DepDelay) * 1000 AS c3 EX ontime UBI ANNO >= 2000 ET ANNUS <= 2008 GROUP BY Portitor;

Q8
SELECT ANNO, AVG(DepDelay) EX TEMPORE GROUP PER ANNOS;

Q9
select Year, count (*) as c1 from ontime group by Year;

Q10
SELECT AVG(ENT) FROM (SELECT Year, Mens, count (*) AS cnt FROM ontime WHERE DepDel15=1 GROUP BY Year, Month) a;

Q11
select avg(c1) ex (select Year, Month, count(*) as c1 from ontime group by Year, Month) a;

Q12
SELECT OriginCityName, DestCityName, count(*) AS c AB TEMPORE GROUP PER OriginCityName, DestCityName ORDINE PER c DESC LIMIT 10;

Q13
SELECT OriginCityName, count(*) AS c AB ontime GROUP BY OriginCityName ORDINE PER c DESC LIMIT 10;

Query Joins Continet

Q14
SELECT a.Year, c1/c2 FROM (electus Annus, comes ()1000 ut c1 ab ontime WHERE DepDelay>10 GROUP PER ANNUM) INTERIOR JOIN (selectus annus, computa(*) ut c2 ab ontime GROUP BY Year) b on a.Year=b.Year order BY a.Year;

Q15
Annus.)1000 ut c1 a fonte VBI DepDelay> 10 GROUP PER "ANNUM") INTERIOR JOIN (select "Annum" computat ut c2 a fonti- sima AB "Anno) b in a. Annus = b. "annus";

Table-1: Queries in velit fermentum

Quaesitum supplicium

Hic sunt eventus singulorum queriarum cum in diversis fundis datorum currunt: PostgreSQL cum indice sine indice, patria strepita et clickhousedb_fdw. Tempus in milliseconds ostenditur.

Q#
PostgreSQL
PostgreSQL (Indexed)
clickhouse
clickhousedb_fdw

Q1
27920
19634
23
57

Q2
35124
17301
50
80

Q3
34046
15618
67
115

Q4
31632
7667
25
37

Q5
47220
8976
27
60

Q6
58233
24368
55
153

Q7
30566
13256
52
91

Q8
38309
60511
112
179

Q9
20674
37979
31
81

Q10
34990
20102
56
148

Q11
30489
51658
37
155

Q12
39357
33742
186
1333

Q13
29912
30709
101
384

Q14
54126
39913
124
1364212

Q15
97258
30211
245
259

Mensam-I: Tempus capta est ut exsequi queries in velit fermentum

Visum eventus

Aliquam lacinia purus tempus ostendit inquisitionis exsecutionis in milliseconds, X axis numerum interrogationis e tabulis superne ostendit, et Y axis tempus in milliseconds executionem ostendit. Proventus strepita et notitia reddita ex postgres utens strepita-housedb_fdw monstrantur. Ex tabula videre potes ingentem differentiam esse inter PostgreSQL et strepita House, sed minimam differentiam inter ClickHouse et clickhousedb_fdw.

Experientia inquisitionis analyticae in PostgreSQL, ClickHouse et clickhousedb_fdw (PostgreSQL)

Hoc graphium ostendit differentiam inter ClickhouseDB et clickhousedb_fdw. In plerisque quaestionibus, caput FDW non est altum et vix significativum nisi per Q12. Quaestio haec coniungit ordinemque per clausulam includit. Propter ordinem GROUP/BY clausula, ORDO PER globum ad strepita non cadit.

In Tabula 2 videmus tempus jump in quaestionibus Q12 et Q13. Et iterum hoc causatur per ordinem per clausulam. Et ad hoc confirmandum cucurri queries Q-14 et Q-15 cum sine ordine per clausulam. Sine clausula ORDINE PERACTIO tempus 259ms est et cum ORDINE BY clausula est 1364212. Ad quaestionem hanc solvendam explico et quaesita et hic explicationis eventus.

Q15: Sine ordine clausula

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 
     FROM (SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a
     INNER JOIN(SELECT "Year", count(*) AS c2 FROM fontime GROUP BY "Year") b ON a."Year"=b."Year";

Q15: Query sine ordine clausulam

QUERY PLAN                                                      
Hash Join  (cost=2250.00..128516.06 rows=50000000 width=12)  
Output: fontime."Year", (((count(*) * 1000)) / b.c2)  
Inner Unique: true   Hash Cond: (fontime."Year" = b."Year")  
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)        
Output: fontime."Year", ((count(*) * 1000))        
Relations: Aggregate on (fontime)        
Remote SQL: SELECT "Year", (count(*) * 1000) FROM "default".ontime WHERE (("DepDelay" > 10)) GROUP BY "Year"  
->  Hash  (cost=999.00..999.00 rows=100000 width=12)        
Output: b.c2, b."Year"        
->  Subquery Scan on b  (cost=1.00..999.00 rows=100000 width=12)              
Output: b.c2, b."Year"              
->  Foreign Scan  (cost=1.00..-1.00 rows=100000 width=12)                    
Output: fontime_1."Year", (count(*))                    
Relations: Aggregate on (fontime)                    
Remote SQL: SELECT "Year", count(*) FROM "default".ontime GROUP BY "Year"(16 rows)

Q14: Query cum ordine clausula

bm=# EXPLAIN VERBOSE SELECT a."Year", c1/c2 FROM(SELECT "Year", count(*)*1000 AS c1 FROM fontime WHERE "DepDelay" > 10 GROUP BY "Year") a 
     INNER JOIN(SELECT "Year", count(*) as c2 FROM fontime GROUP BY "Year") b  ON a."Year"= b."Year" 
     ORDER BY a."Year";

Q14: Query Plan cum ordine clausula

QUERY PLAN 
Merge Join  (cost=2.00..628498.02 rows=50000000 width=12)   
Output: fontime."Year", (((count(*) * 1000)) / (count(*)))   
Inner Unique: true   Merge Cond: (fontime."Year" = fontime_1."Year")   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)        
Output: fontime."Year", (count(*) * 1000)         
Group Key: fontime."Year"         
->  Foreign Scan on public.fontime  (cost=1.00..-1.00 rows=100000 width=4)               
Remote SQL: SELECT "Year" FROM "default".ontime WHERE (("DepDelay" > 10)) 
            ORDER BY "Year" ASC   
->  GroupAggregate  (cost=1.00..499.01 rows=1 width=12)         
Output: fontime_1."Year", count(*)         Group Key: fontime_1."Year"         
->  Foreign Scan on public.fontime fontime_1  (cost=1.00..-1.00 rows=100000 width=4) 
              
Remote SQL: SELECT "Year" FROM "default".ontime ORDER BY "Year" ASC(16 rows)

conclusio,

Eventus horum experimentorum ostendunt ClickHouse offerre vere bonum faciendum, et strepitahousedb_fdw offert beneficia globorum strepitandorum e PostgreSQL. Cum aliquid supra caput est cum utens clickhousedb_fdw, neglegenda est et comparabilis ad perficiendum quod currit indigena in datorum strepitandorum. Hoc etiam confirmat fdw in PostgreSQL optimos proventus praebet.

Telegram chat per Clickhouse https://t.me/clickhouse_ru
Curabitur telegraphum utens PostgreSQL https://t.me/pgsql

Source: www.habr.com

Add a comment