Li-CPU tsa sejoale-joale li na le li-cores tse ngata. Ka lilemo tse ngata, likopo esale li romella lipotso ho database ka ho tsamaisana. Haeba e le potso ea tlaleho holim'a mela e mengata tafoleng, e sebetsa ka potlako ha e sebelisa li-CPU tse ngata, 'me PostgreSQL e khonne ho etsa sena ho tloha phetolelong ea 9.6.
Ho nkile lilemo tse 3 ho kenya tšebetsong karolo ea potso e ts'oanang - re ile ra tlameha ho ngola khoutu bocha ka mekhahlelo e fapaneng ea ts'ebetso ea potso. PostgreSQL 9.6 e hlahisitse lisebelisoa tsa motheo ho ntlafatsa khoutu. Liphetolelong tse latelang, mefuta e meng ea lipotso e phethisoa ka ho tšoana.
Lithibelo
Se ke oa lumella ts'ebetso e tšoanang haeba li-cores tsohle li se li ntse li sebetsa, ho seng joalo likopo tse ling li tla fokotseha.
Habohlokoa le ho feta, ts'ebetso e ts'oanang e nang le litekanyetso tse phahameng tsa WORK_MEM e sebelisa mohopolo o mongata - ho kopanya hash kapa mofuta o mong le o mong o nka memori_mem.
Lipotso tsa OLTP tsa latency tse tlase li ke ke tsa potlakisoa ka ts'ebetso e tšoanang. 'Me haeba potso e khutlisa mola o le mong, parallel process e tla e liehisa.
Bahlahisi ba rata ho sebelisa benchmark ea TPC-H. Mohlomong u na le lipotso tse tšoanang bakeng sa ts'ebetso e tšoanang e phethahetseng.
Ke lipotso tsa KHETHA feela ntle le ho notlela predicate tse etsoang ka ho bapisa.
Ka linako tse ling indexing e nepahetseng e molemo ho feta ho sekena ka tatellano ea tafole ka mokhoa o tšoanang.
Lipotso tsa ho khefutsa le li-cursor ha li tšehetsoe.
Mesebetsi ea lifensetere le likarolo tse hlophisitsoeng tse hlophisitsoeng ha li bapale.
Ha o fumane letho mosebetsing oa I/O.
Ha ho na li-algorithms tsa ho hlopha tse tšoanang. Empa lipotso tse nang le mefuta li ka etsoa ka mokhoa o ts'oanang likarolong tse ling.
Kenya sebaka sa CTE (KA ...) ka KHETHA e behiloeng ho nolofalletsa ts'ebetso e tšoanang.
Li-wrappers tsa data tsa motho oa boraro ha li e-so tšehetse ts'ebetso e ts'oanang (empa li ka khona!)
FULL OUTER JOIN ha e tšehetsoe.
max_rows e thibela tšebetso e tšoanang.
Haeba potso e na le tšebetso e sa tšoauoang PARALLEL SAFE, e tla ba ka khoele e le 'ngoe.
Boemo ba ho itšehla thajana ba SERIALIZABLE bo thibela tšebetso e tšoanang.
Sebaka sa teko
Bahlahisi ba PostgreSQL ba lekile ho fokotsa nako ea karabelo ea lipotso tsa benchmark tsa TPC-H. Khoasolla benchmark le e fetole ho PostgreSQL. Ena ke ts'ebeliso e seng molaong ea benchmark ea TPC-H - eseng bakeng sa papiso ea database kapa hardware.
Reha makefile.suite ho Makefile 'me u fetole joalokaha ho hlalositsoe mona: https://github.com/tvondra/pg_tpch . Kopanya khoutu le make command.
Hlahisa lintlha: ./dbgen -s 10 e theha database ea 23 GB. Sena se lekane ho bona phapang ts'ebetsong ea lipotso tse bapisang le tse sa bapisoang.
Fetolela lifaele tbl в csv с for и sed.
Tlosa sebaka sa polokelo pg_tpch le ho kopitsa lifaele csv в pg_tpch/dss/data.
Etsa lipotso ka taelo qgen.
Kenya data ho database ka taelo ./tpch.sh.
Ho sekena ka tatellano e tšoanang
E kanna ea potlaka eseng ka lebaka la ho bala ka mokhoa o ts'oanang, empa hobane data e phatlalalitsoe ho li-cores tse ngata tsa CPU. Litsamaisong tsa sejoale-joale tsa ts'ebetso, lifaele tsa data tsa PostgreSQL li bolokiloe hantle. Ha u bala esale pele, hoa khonahala ho fumana sebaka se seholo sa polokelo ho feta likopo tsa daemon tsa PG. Ka hona, ts'ebetso ea lipotso ha e felle feela ka disk I / O. E sebelisa lipotoloho tsa CPU ho:
bala mela ka bonngoe ho tloha maqepheng a tafole;
bapisa boleng ba likhoele le maemo WHERE.
Ha re etse potso e bonolo select:
tpch=# explain analyze select l_quantity as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..1964772.00 rows=58856235 width=5) (actual time=0.014..16951.669 rows=58839715 loops=1)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 1146337
Planning Time: 0.203 ms
Execution Time: 19035.100 ms
Sequential scanning e hlahisa mela e mengata haholo ntle le ho kopanya, kahoo potso e etsoa ke CPU e le 'ngoe.
Haeba u eketsa SUM(), o ka bona hore li-workflows tse peli li tla thusa ho potlakisa potso:
explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms
Kopanyo e bapileng
Node ea Parallel Seq Scan e hlahisa mela bakeng sa ho kopanya ka mokhoa o itseng. Node ea "Partial Aggregate" e fokotsa mela ena ka ho sebelisa SUM(). Qetellong, khaontara ea SUM ho tsoa tšebetsong e 'ngoe le e 'ngoe ea basebetsi e bokelloa ke node ea "Bokella".
Sephetho sa ho qetela se baloa ke node ea "Finalize Aggregate". Haeba u na le mesebetsi ea hau ea ho kopanya, u se ke oa lebala ho e tšoaea e le "parallel safe".
Palo ea mekhoa ea basebetsi
Palo ea lits'ebetso tsa basebetsi e ka eketsoa ntle le ho qala seva hape:
explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms
Ho etsahalang moo? Ho ne ho e-na le makhetlo a 2 mekhoa e mengata ea mosebetsi, 'me kopo e ile ea e-ba makhetlo a 1,6599 feela ka potlako. Lipalo lia khahla. Re ne re e-na le mekhoa e 2 ea basebetsi le moetapele a le mong. Ka mor'a phetoho e ile ea fetoha 1 + 4.
Potlako ea rona e phahameng ho tloha ts'ebetsong e ts'oanang: 5/3 = 1,66 (6) linako tse ling.
Hona e sebetsa?
Mekhahlelo
Ho etsa kopo kamehla ho qala ka ts'ebetso e etellang pele. Moetapele o etsa ntho e 'ngoe le e' ngoe e sa bapiseng le ts'ebetso e 'ngoe e tšoanang. Mekhoa e meng e etsang likopo tse tšoanang e bitsoa mekhoa ea basebetsi. Parallel processing e sebelisa lisebelisoa tsa motheo mekhoa e matla ea basebetsi ba morao-rao (ho tloha phetolelong ea 9.4). Kaha likarolo tse ling tsa PostgreSQL li sebelisa lits'ebetso ho fapana le likhoele, potso e nang le lits'ebetso tsa basebetsi ba 3 e ka ba makhetlo a 4 ka potlako ho feta ts'ebetso ea setso.
Ho sebelisana
Mekhoa ea basebetsi e buisana le moetapele ka mola oa melaetsa (e thehiloeng mohopolong o arolelanoeng). Ts'ebetso e 'ngoe le e' ngoe e na le mela e 2: bakeng sa liphoso le li-tuples.
Nako le nako ha tafole e le kholo ka makhetlo a 3 ho feta min_parallel_(index|table)_scan_size, Postgres e eketsa ts'ebetso ea basebetsi. Palo ea li-workflows ha e ea ipapisa le litšenyehelo. Ho itšetleha ka selikalikoe ho etsa hore ts'ebetsong e rarahaneng e be thata. Ho e-na le hoo, moqapi o sebelisa melao e bonolo.
Ha e le hantle, melao ena ha e tšoanelehe kamehla bakeng sa tlhahiso, kahoo o ka fetola palo ea mekhoa ea basebetsi bakeng sa tafole e itseng: ALTER TABLE ... SET (parallel_workers = N).
Hobaneng ha parallel process e sa sebelisoe?
Ntle le lethathamo le lelelele la lithibelo, ho boetse ho na le licheke tsa litšenyehelo:
parallel_setup_cost - ho qoba ts'ebetso e ts'oanang ea likopo tse khutšoane. Paramethara ena e hakanya nako ea ho lokisa mohopolo, ho qala ts'ebetso, le phapanyetsano ea data ea pele.
parallel_tuple_cost: puisano pakeng tsa moetapele le basebetsi e ka lieha ho latela palo ea li-tuple tse tsoang lits'ebetsong tsa mosebetsi. Paramethara ena e lekanya litšenyehelo tsa phapanyetsano ea data.
Nested Loop Joins
PostgreSQL 9.6+ может выполнять вложенные циклы параллельно — это простая операция.
explain (costs off) select c_custkey, count(o_orderkey)
from customer left outer join orders on
c_custkey = o_custkey and o_comment not like '%special%deposits%'
group by c_custkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize GroupAggregate
Group Key: customer.c_custkey
-> Gather Merge
Workers Planned: 4
-> Partial GroupAggregate
Group Key: customer.c_custkey
-> Nested Loop Left Join
-> Parallel Index Only Scan using customer_pkey on customer
-> Index Scan using idx_orders_custkey on orders
Index Cond: (customer.c_custkey = o_custkey)
Filter: ((o_comment)::text !~~ '%special%deposits%'::text)
Pokello e etsahala mohatong oa ho qetela, kahoo Nested Loop Left Join ke ts'ebetso e ts'oanang. Parallel Index Feela Scan e ile ea hlahisoa feela ho mofuta oa 10. E sebetsa ka mokhoa o ts'oanang le ho sekena ha seriana sa parallel. Boemo c_custkey = o_custkey e bala odara e le 'ngoe ka khoele ea moreki. Kahoo ha e bapale.
Hash Kopana
Ts'ebetso e 'ngoe le e' ngoe ea basebetsi e iketsetsa tafole ea eona ea hash ho fihlela PostgreSQL 11. 'Me haeba ho na le tse fetang tse' nè tsa mekhoa ena, tshebetso e ke ke ea ntlafala. Phetolelong e ncha, tafole ea hash e arolelanoa. Mokhoa o mong le o mong oa basebetsi o ka sebelisa WORK_MEM ho theha tafole ea hash.
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'AIR')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1996-01-01'
and l_receiptdate < date '1996-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode
LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1964755.66..1964961.44 rows=1 width=27) (actual time=7579.592..7922.997 rows=1 loops=1)
-> Finalize GroupAggregate (cost=1964755.66..1966196.11 rows=7 width=27) (actual time=7579.590..7579.591 rows=1 loops=1)
Group Key: lineitem.l_shipmode
-> Gather Merge (cost=1964755.66..1966195.83 rows=28 width=27) (actual time=7559.593..7922.319 rows=6 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=1963755.61..1965192.44 rows=7 width=27) (actual time=7548.103..7564.592 rows=2 loops=5)
Group Key: lineitem.l_shipmode
-> Sort (cost=1963755.61..1963935.20 rows=71838 width=27) (actual time=7530.280..7539.688 rows=62519 loops=5)
Sort Key: lineitem.l_shipmode
Sort Method: external merge Disk: 2304kB
Worker 0: Sort Method: external merge Disk: 2064kB
Worker 1: Sort Method: external merge Disk: 2384kB
Worker 2: Sort Method: external merge Disk: 2264kB
Worker 3: Sort Method: external merge Disk: 2336kB
-> Parallel Hash Join (cost=382571.01..1957960.99 rows=71838 width=27) (actual time=7036.917..7499.692 rows=62519 loops=5)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1552386.40 rows=71838 width=19) (actual time=0.583..4901.063 rows=62519 loops=5)
Filter: ((l_shipmode = ANY ('{MAIL,AIR}'::bpchar[])) AND (l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND (l_receiptdate >= '1996-01-01'::date) AND (l_receiptdate < '1997-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 11934691
-> Parallel Hash (cost=313722.45..313722.45 rows=3750045 width=20) (actual time=2011.518..2011.518 rows=3000000 loops=5)
Buckets: 65536 Batches: 256 Memory Usage: 3840kB
-> Parallel Seq Scan on orders (cost=0.00..313722.45 rows=3750045 width=20) (actual time=0.029..995.948 rows=3000000 loops=5)
Planning Time: 0.977 ms
Execution Time: 7923.770 ms
Potso ea 12 e tsoang ho TPC-H e bontša ka ho hlaka khokahano ea hash e tšoanang. Mokhoa o mong le o mong oa basebetsi o kenya letsoho ho theheng tafole e tloaelehileng ea hash.
Kopanya Kopanya
Kopano ea ho kopanya ha e bapale ka tlhaho. Seke oa tšoenyeha haeba ona e le mohato oa ho qetela oa potso - e ntse e ka sebetsa ka ho ts'oana.
-- Query 2 from TPC-H
explain (costs off) select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
from part, supplier, partsupp, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 36
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from partsupp, supplier, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by s_acctbal desc, n_name, s_name, p_partkey
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Limit
-> Sort
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
-> Merge Join
Merge Cond: (part.p_partkey = partsupp.ps_partkey)
Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
-> Gather Merge
Workers Planned: 4
-> Parallel Index Scan using <strong>part_pkey</strong> on part
Filter: (((p_type)::text ~~ '%BRASS'::text) AND (p_size = 36))
-> Materialize
-> Sort
Sort Key: partsupp.ps_partkey
-> Nested Loop
-> Nested Loop
Join Filter: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on region
Filter: (r_name = 'AMERICA'::bpchar)
-> Hash Join
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier
-> Hash
-> Seq Scan on nation
-> Index Scan using idx_partsupp_suppkey on partsupp
Index Cond: (ps_suppkey = supplier.s_suppkey)
SubPlan 1
-> Aggregate
-> Nested Loop
Join Filter: (nation_1.n_regionkey = region_1.r_regionkey)
-> Seq Scan on region region_1
Filter: (r_name = 'AMERICA'::bpchar)
-> Nested Loop
-> Nested Loop
-> Index Scan using idx_partsupp_partkey on partsupp partsupp_1
Index Cond: (part.p_partkey = ps_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1
Index Cond: (n_nationkey = supplier_1.s_nationkey)
Node ea "Merge Join" e fumaneha kaholimo ho "Gather Merge". Kahoo ho kopanya ha ho sebelise ts'ebetso e tšoanang. Empa node ea "Parallel Index Scan" e ntse e thusa ka karolo part_pkey.
Khokahano ka likarolo
Ho PostgreSQL 11 khokahano ka likarolo e holofalitsoe ke kamehla: e na le kemiso e turang haholo. Litafole tse nang le karohano e tšoanang li ka kopantsoe karohano ka karohano. Ka tsela ena Postgres e tla sebelisa litafole tse nyane tsa hash. Khokahano e 'ngoe le e' ngoe ea likarolo e ka tšoana.
tpch=# set enable_partitionwise_join=t;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
QUERY PLAN
---------------------------------------------------
Append
-> Hash Join
Hash Cond: (t2.b = t1.a)
-> Seq Scan on prt2_p1 t2
Filter: ((b >= 0) AND (b <= 10000))
-> Hash
-> Seq Scan on prt1_p1 t1
Filter: (b = 0)
-> Hash Join
Hash Cond: (t2_1.b = t1_1.a)
-> Seq Scan on prt2_p2 t2_1
Filter: ((b >= 0) AND (b <= 10000))
-> Hash
-> Seq Scan on prt1_p2 t1_1
Filter: (b = 0)
tpch=# set parallel_setup_cost = 1;
tpch=# set parallel_tuple_cost = 0.01;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
QUERY PLAN
-----------------------------------------------------------
Gather
Workers Planned: 4
-> Parallel Append
-> Parallel Hash Join
Hash Cond: (t2_1.b = t1_1.a)
-> Parallel Seq Scan on prt2_p2 t2_1
Filter: ((b >= 0) AND (b <= 10000))
-> Parallel Hash
-> Parallel Seq Scan on prt1_p2 t1_1
Filter: (b = 0)
-> Parallel Hash Join
Hash Cond: (t2.b = t1.a)
-> Parallel Seq Scan on prt2_p1 t2
Filter: ((b >= 0) AND (b <= 10000))
-> Parallel Hash
-> Parallel Seq Scan on prt1_p1 t1
Filter: (b = 0)
Ntho e ka sehloohong ke hore ho hokahanya likarolong ho tšoana feela haeba likarolo tsena li le khōlō ka ho lekaneng.
Phatlalatso ea Parallel
Phatlalatso ea Parallel e ka sebelisoa ho e-na le li-blocks tse fapaneng ho li-workflows tse fapaneng. Hangata sena se etsahala ka lipotso tsa UNION ALL. Bobebe ke ho bapisa ho fokolang, hobane ts'ebetso e 'ngoe le e' ngoe ea mosebeletsi e sebetsa feela kopo e le 'ngoe.
Ho na le lits'ebetso tse 2 tsa basebetsi tse sebetsang mona, leha tse 4 li lumelletsoe.
tpch=# explain (costs off) select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day union all select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '2000-12-01' - interval '105' day;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather
Workers Planned: 2
-> Parallel Append
-> Aggregate
-> Seq Scan on lineitem
Filter: (l_shipdate <= '2000-08-18 00:00:00'::timestamp without time zone)
-> Aggregate
-> Seq Scan on lineitem lineitem_1
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Liphetoho tsa bohlokoa ka ho fetisisa
WORK_MEM e fokotsa mohopolo ka mokhoa o mong le o mong, eseng feela lipotso: work_mem lits'ebetso dikgokelo = mohopolo o mongata.
max_parallel_workers_per_gather - hore na ke basebetsi ba bakae ba sebetsanang le lenaneo la ts'ebetso le tla sebelisoa bakeng sa ts'ebetso e ts'oanang ho tsoa ho moralo.
max_worker_processes - e fetola palo eohle ea lits'ebetso tsa basebetsi ho palo ea li-cores tsa CPU ho seva.
max_parallel_workers - e ts'oanang, empa bakeng sa ts'ebetso e ts'oanang ea mosebetsi.
Liphello
Ho tloha ka mofuta oa 9.6, parallel processing e ka ntlafatsa haholo ts'ebetso ea lipotso tse rarahaneng tse hlahlobang mela e mengata kapa li-index. Ho PostgreSQL 10, ts'ebetso e ts'oanang e etsoa ka mokhoa o ikhethileng. Hopola ho e tima ho li-server tse nang le mosebetsi o mongata oa OLTP. Litlhahlobo tse latellanang kapa li-index scans li ja lisebelisoa tse ngata. Haeba o sa etse tlaleho ho dataset kaofela, o ka ntlafatsa ts'ebetso ea lipotso ka ho kenya li-index tse sieo kapa ka ho arola ka nepo.