Ama-CPU anamuhla anama-cores amaningi. Iminyaka eminingi, izicelo bezilokhu zithumela imibuzo kusizindalwazi ngokuhambisana. Uma kuwumbuzo wombiko emigqeni eminingi etafuleni, igijima ngokushesha uma usebenzisa ama-CPU amaningi, futhi i-PostgreSQL ikwazile ukwenza lokhu kusukela kunguqulo 9.6.
Kuthathe iminyaka emi-3 ukusebenzisa isici sombuzo ofanayo - kudingeke ukuthi siphinde sibhale ikhodi ezigabeni ezihlukene zokusetshenziswa kombuzo. I-PostgreSQL 9.6 yethule ingqalasizinda ukuze kuthuthukiswe ikhodi. Ezinguqulweni ezilandelayo, ezinye izinhlobo zemibuzo zisetshenziswa ngokuhambisana.
Izithibelo
- Ungakuniki amandla ukusetshenziswa okuhambisanayo uma wonke ama-cores asematasa, ngaphandle kwalokho ezinye izicelo zizohamba kancane.
- Okubaluleke kakhulu, ukucubungula okuhambisanayo okunamanani aphezulu we-WORK_MEM kusebenzisa inkumbulo eningi - ukuhlanganisa kwe-hashi ngakunye noma uhlobo kuthatha inkumbulo yomsebenzi_mem.
- Imibuzo ye-OLTP yokubambezeleka ephansi ayikwazi ukusheshiswa ngokusebenza okufanayo. Futhi uma umbuzo ubuyisela umugqa owodwa, ukucutshungulwa okufanayo kuzokunensa kuphela.
- Onjiniyela bathanda ukusebenzisa ibhentshimakhi ye-TPC-H. Mhlawumbe unemibuzo efanayo yokwenza umsebenzi ofanayo.
- Imibuzo ethi KHETHA kuphela ngaphandle kokukhiya isilandiso yenziwa ngokuhambisana.
- Kwesinye isikhathi ukukhomba ngendlela efanele kungcono kunokuskena kwethebula okulandelanayo ngemodi efanayo.
- Ukumisa isikhashana imibuzo nezikhombisi azisekelwe.
- Imisebenzi yewindi kanye nemisebenzi ehlanganisiwe esethiwe ayihambelani.
- Awuzuzi lutho emsebenzini we-I/O.
- Awekho ama-algorithms okuhlunga afanayo. Kodwa imibuzo enezinhlobo ingenziwa ngokufana kwezinye izici.
- Miselela i-CTE (NGE ...) nge-KHETHA okusidleke ukuze unike amandla ukucubungula okufanayo.
- Izisonga zedatha zenkampani yangaphandle azikusekeli ukucubungula okufanayo (kodwa zingakwazi!)
- FULL OUTER JOIN ayisekelwe.
- ama-max_rows akhubaza ukucubungula okufanayo.
- Uma umbuzo unomsebenzi ongamakiwe ngokuthi PARALLEL SAFE, uzoba wuchungechunge olulodwa.
- Izinga le-SERIALIZABLE transaction isolation likhubaza ukucubungula okufanayo.
Indawo yokuhlola
Abathuthukisi be-PostgreSQL bazame ukunciphisa isikhathi sokuphendula semibuzo yebhentshimakhi ye-TPC-H. Landa ibhentshimakhi futhi
- Landa i-TPC-H_Tools_v2.17.3.zip (noma inguqulo entsha)
kusuka ku-TPC ngaphandle kwesizinda . - Qamba kabusha i-makefile.suite ku-Makefile futhi ushintshe njengoba kuchazwe lapha:
https://github.com/tvondra/pg_tpch . Hlanganisa ikhodi nge-make command. - Khiqiza idatha:
./dbgen -s 10
yakha isizindalwazi esingu-23 GB. Lokhu kwanele ukubona umehluko ekusebenzeni kwemibuzo ehambisanayo nengahambelani. - Guqula amafayela
tbl
вcsv с for
иsed
. - Khipha indawo yokugcina
pg_tpch
bese ukopisha amafayelacsv
вpg_tpch/dss/data
. - Dala imibuzo ngomyalo
qgen
. - Layisha idatha kusizindalwazi ngomyalo
./tpch.sh
.
Ukuskena okulandelanayo okuhambisanayo
Kungase kusheshe hhayi ngenxa yokufunda okufanayo, kodwa ngenxa yokuthi idatha isatshalaliswa kumacores amaningi e-CPU. Ezinhlelweni zokusebenza zanamuhla, amafayela edatha ye-PostgreSQL agcinwe kahle. Ngokufunda kusengaphambili, kuyenzeka ukuthi uthole ibhulokhi enkulu endaweni yokugcina kunezicelo zedaemon ye-PG. Ngakho-ke, ukusebenza kombuzo akukhawulelwe yidiski I/O. Isebenzisa imijikelezo ye-CPU ukuze:
- funda imigqa eyodwa ngesikhathi emakhasini ethebula;
- qhathanisa amanani nemibandela yochungechunge
WHERE
.
Masiqalise umbuzo olula select
:
tpch=# explain analyze select l_quantity as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..1964772.00 rows=58856235 width=5) (actual time=0.014..16951.669 rows=58839715 loops=1)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 1146337
Planning Time: 0.203 ms
Execution Time: 19035.100 ms
Ukuskena okulandelanayo kukhiqiza imigqa eminingi kakhulu ngaphandle kokuhlanganisa, ngakho-ke umbuzo usetshenziswa umgogodla owodwa we-CPU.
Uma ungeza SUM()
, ungabona ukuthi ukugeleza komsebenzi okubili kuzosiza ukusheshisa umbuzo:
explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms
Ukuhlanganisa okuhambisanayo
I-Parallel Seq Scan node ikhiqiza imigqa yokuhlanganisa ingxenye. Inodi ye-"Partial Aggregate" isika le migqa isebenzisa SUM()
. Ekugcineni, ikhawunta ye-SUM yenqubo ngayinye yomsebenzi iqoqwa ngendawo ethi “Qoqa”.
Umphumela wokugcina ubalwa ngendawo ethi “Qeda Ukuhlanganisa”. Uma unemisebenzi yakho yokuhlanganisa, ungakhohlwa ukuyimaka njengokuthi “i-parallel safe”.
Inani lezinqubo zabasebenzi
Inani lezinqubo zabasebenzi linganyuswa ngaphandle kokuqalisa kabusha iseva:
explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms
Kwenzakalani lapha? Kube nezinqubo zokusebenza eziphindwe izikhathi ezi-2, futhi isicelo saba ngokushesha izikhathi eziyi-1,6599. Izibalo ziyathakazelisa. Sibe nezinqubo ezi-2 zabasebenzi nomholi oyedwa. Ngemuva koshintsho kube yi-1+4.
Ukusheshisa kwethu okuphezulu kusuka ekucutshungulweni okufanayo: 5/3 = 1,66 (6) izikhathi.
Isebenza kanjani?
Izinqubo
Ukwenziwa kwesicelo kuhlala kuqala ngenqubo ehamba phambili. Umholi wenza konke okungahambisani nokunye ukucubungula okufanayo. Ezinye izinqubo ezenza izicelo ezifanayo zibizwa ngokuthi izinqubo zabasebenzi. Ukucubungula okufanayo kusebenzisa ingqalasizinda
Ukusebenzisana
Izinqubo zabasebenzi zixhumana nomholi ngomugqa wemiyalezo (ngokusekelwe kwinkumbulo eyabiwe). Inqubo ngayinye inemigqa emi-2: yamaphutha kanye nama-tuples.
Mangaki ama-workflows adingekayo?
Umkhawulo omncane ucaciswa ipharamitha max_parallel_workers_per_gather
max_parallel_workers size
max_worker_processes
Uma kungenzeki ukwaba inqubo yesisebenzi, ukucubungula kuzoba yinqubo eyodwa.
Umhleli wemibuzo anganciphisa ukuhamba komsebenzi kuye ngosayizi wethebula noma inkomba. Kunamapharamitha alokhu min_parallel_table_scan_size
min_parallel_index_scan_size
set min_parallel_table_scan_size='8MB'
8MB table => 1 worker
24MB table => 2 workers
72MB table => 3 workers
x => log(x / min_parallel_table_scan_size) / log(3) + 1 worker
Njalo ithebula likhulu ngokuphindwe ka-3 kune min_parallel_(index|table)_scan_size
, i-Postgres yengeza inqubo yesisebenzi. Inani lokugeleza komsebenzi alisekelwe ezindlekweni. Ukuncika okuyisiyingi kwenza ukuqaliswa okuyinkimbinkimbi kube nzima. Kunalokho, umhleli usebenzisa imithetho elula.
Eqinisweni, le mithetho ayifanele ngaso sonke isikhathi ukukhiqizwa, ngakho-ke ungakwazi ukushintsha inani lezinqubo zabasebenzi etafuleni elithile: ALTER TABLE ... SET (parallel_workers = N
).
Kungani i-parallel processing ingasetshenziswa?
Ngaphezu kohlu olude lwemikhawulo, kukhona nokuhlolwa kwezindleko:
parallel_setup_cost
parallel_tuple_cost
I-Nested Loop Joins
PostgreSQL 9.6+ может выполнять вложенные циклы параллельно — это простая операция.
explain (costs off) select c_custkey, count(o_orderkey)
from customer left outer join orders on
c_custkey = o_custkey and o_comment not like '%special%deposits%'
group by c_custkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize GroupAggregate
Group Key: customer.c_custkey
-> Gather Merge
Workers Planned: 4
-> Partial GroupAggregate
Group Key: customer.c_custkey
-> Nested Loop Left Join
-> Parallel Index Only Scan using customer_pkey on customer
-> Index Scan using idx_orders_custkey on orders
Index Cond: (customer.c_custkey = o_custkey)
Filter: ((o_comment)::text !~~ '%special%deposits%'::text)
Iqoqo lenzeka esigabeni sokugcina, ngakho-ke I-Nested Loop Left Joyina ingumsebenzi ofanayo. I-Parallel Index Yokuskena Kuphela kwethulwa kunguqulo 10. Isebenza ngokufana nokuskena kwe-serial okuhambisanayo. Isimo c_custkey = o_custkey
ifunda i-oda elilodwa ngochungechunge lweklayenti ngayinye. Ngakho ayihambisani.
Hash Joyina
Inqubo ngayinye yesisebenzi idala itafula layo le-hashi kuze kube yi-PostgreSQL 11. Futhi uma kunezingaphezu kwezine zalezi zinqubo, ukusebenza ngeke kuthuthuke. Enguqulweni entsha, ithebula le-hashi labiwe. Inqubo ngayinye yesisebenzi ingasebenzisa i-WORK_MEM ukuze idale ithebula le-hash.
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'AIR')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1996-01-01'
and l_receiptdate < date '1996-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode
LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1964755.66..1964961.44 rows=1 width=27) (actual time=7579.592..7922.997 rows=1 loops=1)
-> Finalize GroupAggregate (cost=1964755.66..1966196.11 rows=7 width=27) (actual time=7579.590..7579.591 rows=1 loops=1)
Group Key: lineitem.l_shipmode
-> Gather Merge (cost=1964755.66..1966195.83 rows=28 width=27) (actual time=7559.593..7922.319 rows=6 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=1963755.61..1965192.44 rows=7 width=27) (actual time=7548.103..7564.592 rows=2 loops=5)
Group Key: lineitem.l_shipmode
-> Sort (cost=1963755.61..1963935.20 rows=71838 width=27) (actual time=7530.280..7539.688 rows=62519 loops=5)
Sort Key: lineitem.l_shipmode
Sort Method: external merge Disk: 2304kB
Worker 0: Sort Method: external merge Disk: 2064kB
Worker 1: Sort Method: external merge Disk: 2384kB
Worker 2: Sort Method: external merge Disk: 2264kB
Worker 3: Sort Method: external merge Disk: 2336kB
-> Parallel Hash Join (cost=382571.01..1957960.99 rows=71838 width=27) (actual time=7036.917..7499.692 rows=62519 loops=5)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1552386.40 rows=71838 width=19) (actual time=0.583..4901.063 rows=62519 loops=5)
Filter: ((l_shipmode = ANY ('{MAIL,AIR}'::bpchar[])) AND (l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND (l_receiptdate >= '1996-01-01'::date) AND (l_receiptdate < '1997-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 11934691
-> Parallel Hash (cost=313722.45..313722.45 rows=3750045 width=20) (actual time=2011.518..2011.518 rows=3000000 loops=5)
Buckets: 65536 Batches: 256 Memory Usage: 3840kB
-> Parallel Seq Scan on orders (cost=0.00..313722.45 rows=3750045 width=20) (actual time=0.029..995.948 rows=3000000 loops=5)
Planning Time: 0.977 ms
Execution Time: 7923.770 ms
Umbuzo 12 ovela ku-TPC-H ubonisa ngokusobala uxhumano lwe-hashi oluhambisanayo. Inqubo ngayinye yesisebenzi inikela ekwakhiweni kwetafula le-hashi elivamile.
Hlanganisa Joyina
Ukuhlanganisa ukuhlanganisa akuhambelani ngokwemvelo. Ungakhathazeki uma lesi kuyisinyathelo sokugcina sombuzo - usengasebenza ngokuhambisana.
-- Query 2 from TPC-H
explain (costs off) select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
from part, supplier, partsupp, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 36
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from partsupp, supplier, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by s_acctbal desc, n_name, s_name, p_partkey
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Limit
-> Sort
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
-> Merge Join
Merge Cond: (part.p_partkey = partsupp.ps_partkey)
Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
-> Gather Merge
Workers Planned: 4
-> Parallel Index Scan using <strong>part_pkey</strong> on part
Filter: (((p_type)::text ~~ '%BRASS'::text) AND (p_size = 36))
-> Materialize
-> Sort
Sort Key: partsupp.ps_partkey
-> Nested Loop
-> Nested Loop
Join Filter: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on region
Filter: (r_name = 'AMERICA'::bpchar)
-> Hash Join
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier
-> Hash
-> Seq Scan on nation
-> Index Scan using idx_partsupp_suppkey on partsupp
Index Cond: (ps_suppkey = supplier.s_suppkey)
SubPlan 1
-> Aggregate
-> Nested Loop
Join Filter: (nation_1.n_regionkey = region_1.r_regionkey)
-> Seq Scan on region region_1
Filter: (r_name = 'AMERICA'::bpchar)
-> Nested Loop
-> Nested Loop
-> Index Scan using idx_partsupp_partkey on partsupp partsupp_1
Index Cond: (part.p_partkey = ps_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1
Index Cond: (n_nationkey = supplier_1.s_nationkey)
Inodi ethi "Hlanganisa Joyina" itholakala ngenhla kokuthi "Hlanganisa Ukuhlanganisa". Ngakho ukuhlanganisa akusebenzisi ukucubungula okufanayo. Kodwa i-node ethi "Parallel Index Scan" isasiza ngengxenye part_pkey
.
Ukuxhumana ngezigaba
Ku-PostgreSQL 11
tpch=# set enable_partitionwise_join=t;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
QUERY PLAN
---------------------------------------------------
Append
-> Hash Join
Hash Cond: (t2.b = t1.a)
-> Seq Scan on prt2_p1 t2
Filter: ((b >= 0) AND (b <= 10000))
-> Hash
-> Seq Scan on prt1_p1 t1
Filter: (b = 0)
-> Hash Join
Hash Cond: (t2_1.b = t1_1.a)
-> Seq Scan on prt2_p2 t2_1
Filter: ((b >= 0) AND (b <= 10000))
-> Hash
-> Seq Scan on prt1_p2 t1_1
Filter: (b = 0)
tpch=# set parallel_setup_cost = 1;
tpch=# set parallel_tuple_cost = 0.01;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
QUERY PLAN
-----------------------------------------------------------
Gather
Workers Planned: 4
-> Parallel Append
-> Parallel Hash Join
Hash Cond: (t2_1.b = t1_1.a)
-> Parallel Seq Scan on prt2_p2 t2_1
Filter: ((b >= 0) AND (b <= 10000))
-> Parallel Hash
-> Parallel Seq Scan on prt1_p2 t1_1
Filter: (b = 0)
-> Parallel Hash Join
Hash Cond: (t2.b = t1.a)
-> Parallel Seq Scan on prt2_p1 t2
Filter: ((b >= 0) AND (b <= 10000))
-> Parallel Hash
-> Parallel Seq Scan on prt1_p1 t1
Filter: (b = 0)
Into esemqoka ukuthi ukuxhumana kwezigaba kufana kuphela uma lezi zigaba zikhulu ngokwanele.
Isengezo Esihambisanayo
Kunezinqubo ezi-2 zezisebenzi ezisebenza lapha, nakuba ezi-4 zinikwe amandla.
tpch=# explain (costs off) select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day union all select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '2000-12-01' - interval '105' day;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather
Workers Planned: 2
-> Parallel Append
-> Aggregate
-> Seq Scan on lineitem
Filter: (l_shipdate <= '2000-08-18 00:00:00'::timestamp without time zone)
-> Aggregate
-> Seq Scan on lineitem lineitem_1
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Okuguquguqukayo okubaluleke kakhulu
- WORK_MEM ikhawulela inkumbulo ngenqubo ngayinye, hhayi nje imibuzo: work_mem izinqubo ukuxhumana = inkumbulo eningi.
- bangaki abasebenzi abacubungula uhlelo olusebenzayo oluzosetshenziselwa ukucubungula okufanayo kusuka ohlelweni.max_parallel_workers_per_gather
- ilungisa inani eliphelele lezinqubo zomsebenzi enanini lama-CPU cores kuseva.max_worker_processes
- okufanayo, kodwa ngezinqubo zokusebenza ezihambisanayo.max_parallel_workers
Imiphumela
Kusukela kunguqulo 9.6, ukucubungula okufanayo kungathuthukisa kakhulu ukusebenza kwemibuzo eyinkimbinkimbi eskena imigqa eminingi noma izinkomba. Ku-PostgreSQL 10, ukucubungula okufanayo kunikwe amandla ngokuzenzakalelayo. Khumbula ukuyikhubaza kumaseva anomthwalo omkhulu we-OLTP. Ukuskena okulandelanayo noma ukuskena kwenkomba kudla izinsiza eziningi. Uma ungasebenzisi umbiko kuyo yonke idathasethi, ungathuthukisa ukusebenza kombuzo ngokumane wengeze izinkomba ezingekho noma usebenzise ukwahlukanisa okufanele.
izithenjwa
https://www.postgresql.org/docs/11/how-parallel-query-works.html https://www.postgresql.org/docs/11/parallel-plans.html http://ashutoshpg.blogspot.com/2017/12/partition-wise-joins-divide-and-conquer.html http://rhaas.blogspot.com/2016/04/postgresql-96-with-parallel-query-vs.html http://amitkapila16.blogspot.com/2015/11/parallel-sequential-scans-in-play.html https://write-skew.blogspot.com/2018/01/parallel-hash-for-postgresql.html http://rhaas.blogspot.com/2017/03/parallel-query-v2.html https://blog.2ndquadrant.com/parallel-monster-benchmark/ https://blog.2ndquadrant.com/parallel-aggregate/ https://www.depesz.com/2018/02/12/waiting-for-postgresql-11-support-parallel-btree-index-builds/ Ukufana ku-PostgreSQL 11
Source: www.habr.com