Awọn CPUs ode oni ni ọpọlọpọ awọn ohun kohun. Fun awọn ọdun, awọn ohun elo ti nfi awọn ibeere ranṣẹ si awọn apoti isura data ni afiwe. Ti o ba jẹ ibeere ijabọ lori awọn ori ila pupọ ninu tabili kan, o yara yiyara nigba lilo awọn CPUs pupọ, ati PostgreSQL ti ni anfani lati ṣe eyi lati ẹya 9.6.
O gba ọdun 3 lati ṣe imuse ẹya ibeere ti o jọra - a ni lati tun koodu naa kọ ni awọn ipele oriṣiriṣi ti ipaniyan ibeere. PostgreSQL 9.6 ṣe agbekalẹ awọn amayederun lati mu koodu siwaju sii. Ni awọn ẹya ti o tẹle, awọn iru awọn ibeere miiran ti wa ni ṣiṣe ni afiwe.
Awọn idiwọn
Maṣe mu ipaniyan ti o jọra ṣiṣẹ ti gbogbo awọn ohun kohun ti nšišẹ tẹlẹ, bibẹẹkọ awọn ibeere miiran yoo fa fifalẹ.
Ni pataki julọ, ṣiṣe ni afiwe pẹlu awọn iye WORK_MEM giga nlo iranti pupọ - idapọ hash kọọkan tabi too gba iranti iṣẹ_mem.
Awọn ibeere OLTP alairi kekere ko le ṣe isare nipasẹ ipaniyan ni afiwe. Ati pe ti ibeere naa ba pada ila kan, sisẹ ti o jọra yoo fa fifalẹ nikan.
Awọn olupilẹṣẹ nifẹ lati lo aami ala TPC-H. Boya o ni iru awọn ibeere fun ipaniyan ti o jọra pipe.
Awọn ibeere Yan nikan laisi titiipa asọtẹlẹ jẹ ṣiṣe ni afiwe.
Nigba miiran titọka to dara dara ju ibojuwo tabili lẹsẹsẹ ni ipo afiwe.
Awọn ibeere idaduro ati awọn kọsọ ko ni atilẹyin.
Awọn iṣẹ ferese ati awọn iṣẹ akojọpọ ti a ti paṣẹ ko ṣe afiwe.
O ko jere ohunkohun ninu iṣẹ I/O.
Ko si awọn algoridimu tito lẹsẹsẹ ni afiwe. Ṣugbọn awọn ibeere pẹlu iru le ṣee ṣe ni afiwe ni awọn aaye kan.
Rọpo CTE (PẸLU ...) pẹlu yiyan itẹ-ẹiyẹ lati mu ṣiṣẹ ni afiwe.
Awọn olupilẹṣẹ data ẹni-kẹta ko sibẹsibẹ ṣe atilẹyin sisẹ ti o jọra (ṣugbọn wọn le!)
IPAPO ODE FULL ko ni atilẹyin.
max_rows mu ṣiṣẹ ni afiwe.
Ti ibeere kan ba ni iṣẹ ti ko ni samisi PARALLEL SAFE, yoo jẹ asapo ẹyọkan.
Ipele ipinya idunadura iṣowo SERIALIZABLE npa iṣẹ ṣiṣe ni afiwe.
Idanwo Ayika
Awọn olupilẹṣẹ PostgreSQL gbiyanju lati dinku akoko idahun ti awọn ibeere ala ala TPC-H. Ṣe igbasilẹ ala-ilẹ ati mu o si PostgreSQL. Eyi jẹ lilo laigba aṣẹ ti ala-ilẹ TPC-H - kii ṣe fun ibi ipamọ data tabi lafiwe ohun elo.
Ṣe igbasilẹ TPC-H_Tools_v2.17.3.zip (tabi ẹya tuntun) lati TPC ita.
Tun makefile.suite lorukọ si Makefile ki o yipada bi a ti ṣalaye rẹ nibi: https://github.com/tvondra/pg_tpch . Ṣe akopọ koodu pẹlu aṣẹ ṣiṣe.
Ṣẹda data: ./dbgen -s 10 ṣẹda 23 GB database. Eyi to lati rii iyatọ ninu iṣẹ ti awọn ibeere ti o jọra ati ti kii ṣe afiwe.
Yipada awọn faili tbl в csv с for и sed.
Dide ibi ipamọ pg_tpch ati daakọ awọn faili csv в pg_tpch/dss/data.
Ṣẹda awọn ibeere pẹlu aṣẹ kan qgen.
Gbe data sinu database pẹlu aṣẹ ./tpch.sh.
Ni afiwe lesese Antivirus
O le jẹ yiyara kii ṣe nitori kika afiwera, ṣugbọn nitori pe data ti tan kaakiri ọpọlọpọ awọn ohun kohun Sipiyu. Ni awọn ọna ṣiṣe igbalode, awọn faili data PostgreSQL ti wa ni ipamọ daradara. Pẹlu kika siwaju, o ṣee ṣe lati gba bulọọki nla lati ibi ipamọ ju awọn ibeere PG daemon lọ. Nitorinaa, iṣẹ ṣiṣe ibeere ko ni opin nipasẹ I/O disk. O nlo awọn iyipo Sipiyu lati:
ka awọn ori ila kan ni akoko kan lati awọn oju-iwe tabili;
afiwe okun iye ati ipo WHERE.
Jẹ ki a ṣiṣẹ ibeere ti o rọrun select:
tpch=# explain analyze select l_quantity as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on lineitem (cost=0.00..1964772.00 rows=58856235 width=5) (actual time=0.014..16951.669 rows=58839715 loops=1)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 1146337
Planning Time: 0.203 ms
Execution Time: 19035.100 ms
Ayẹwo ọkọọkan ṣe agbejade awọn ori ila pupọ ju laisi akojọpọ, nitorinaa ibeere naa jẹ ṣiṣe nipasẹ mojuto Sipiyu kan.
Ti o ba fi kun SUM(), o le rii pe ṣiṣan iṣẹ meji yoo ṣe iranlọwọ lati mu ibeere naa yarayara:
explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms
Akopọ ti o jọra
Idena Seq Scan Ti o jọra ṣe agbejade awọn ori ila fun ikojọpọ apa kan. Ipin “Apapọ Apapọ” n ge awọn ila wọnyi ni lilo SUM(). Ni ipari, SUM counter lati ilana oṣiṣẹ kọọkan ni a gba nipasẹ ipade “Gather”.
Abajade ti o kẹhin jẹ iṣiro nipasẹ ọna “Ipari Apapọ”. Ti o ba ni awọn iṣẹ akojọpọ tirẹ, maṣe gbagbe lati samisi wọn bi “ailewu afiwe”.
Nọmba awọn ilana ti oṣiṣẹ
Nọmba awọn ilana oṣiṣẹ le pọ si laisi tun bẹrẹ olupin naa:
explain analyze select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1589702.14..1589702.15 rows=1 width=32) (actual time=8553.365..8553.365 rows=1 loops=1)
-> Gather (cost=1589701.91..1589702.12 rows=2 width=32) (actual time=8553.241..8555.067 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=1588701.91..1588701.92 rows=1 width=32) (actual time=8547.546..8547.546 rows=1 loops=3)
-> Parallel Seq Scan on lineitem (cost=0.00..1527393.33 rows=24523431 width=5) (actual time=0.038..5998.417 rows=19613238 loops=3)
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 382112
Planning Time: 0.241 ms
Execution Time: 8555.131 ms
Kini n ṣẹlẹ nibi? Awọn ilana iṣẹ 2 diẹ sii wa, ati pe ibeere naa di awọn akoko 1,6599 nikan ni iyara. Awọn isiro ni awon. A ni awọn ilana oṣiṣẹ 2 ati oludari 1. Lẹhin iyipada o di 4+1.
Iyara ti o pọju wa lati sisẹ deede: 5/3 = 1,66 (6) igba.
Bawo ni o ṣiṣẹ?
Awọn ilana
Ipese ibere nigbagbogbo bẹrẹ pẹlu ilana asiwaju. Olori ṣe ohun gbogbo ti kii ṣe afiwe ati diẹ ninu awọn ilana ti o jọra. Awọn ilana miiran ti o ṣe awọn ibeere kanna ni a pe ni awọn ilana oṣiṣẹ. Ni afiwe processing nlo amayederun ìmúdàgba isale Osise lakọkọ (lati ẹya 9.4). Niwọn igba ti awọn ẹya miiran ti PostgreSQL lo awọn ilana kuku ju awọn okun, ibeere kan pẹlu awọn ilana oṣiṣẹ 3 le jẹ awọn akoko 4 yiyara ju sisẹ ibile lọ.
Ibaraẹnisọrọ
Awọn ilana oṣiṣẹ ṣe ibasọrọ pẹlu oludari nipasẹ isinyi ifiranṣẹ (da lori iranti pinpin). Ilana kọọkan ni awọn ila 2: fun awọn aṣiṣe ati fun awọn tuples.
Ni gbogbo igba ti awọn tabili ni 3 igba tobi ju min_parallel_(index|table)_scan_size, Postgres ṣe afikun ilana oṣiṣẹ kan. Nọmba awọn iṣan-iṣẹ ko da lori awọn idiyele. Igbẹkẹle iyika jẹ ki awọn imuse ti o nira. Dipo, oluṣeto naa nlo awọn ofin ti o rọrun.
Ni iṣe, awọn ofin wọnyi ko dara nigbagbogbo fun iṣelọpọ, nitorinaa o le yi nọmba awọn ilana oṣiṣẹ pada fun tabili kan pato: ALTER TABLE ... SET (parallel_workers = N).
Kini idi ti iṣelọpọ ti o jọra ko lo?
Ni afikun si atokọ gigun ti awọn ihamọ, awọn sọwedowo iye owo tun wa:
parallel_setup_cost - lati yago fun ni afiwe processing ti kukuru ibeere. Paramita yii ṣe iṣiro akoko lati ṣeto iranti, bẹrẹ ilana, ati paṣipaarọ data ibẹrẹ.
parallel_tuple_cost: ibaraẹnisọrọ laarin olori ati awọn oṣiṣẹ le ṣe idaduro ni ibamu si nọmba awọn tuples lati awọn ilana iṣẹ. paramita yii ṣe iṣiro idiyele ti paṣipaarọ data.
Itẹle Loop Darapo
PostgreSQL 9.6+ может выполнять вложенные циклы параллельно — это простая операция.
explain (costs off) select c_custkey, count(o_orderkey)
from customer left outer join orders on
c_custkey = o_custkey and o_comment not like '%special%deposits%'
group by c_custkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize GroupAggregate
Group Key: customer.c_custkey
-> Gather Merge
Workers Planned: 4
-> Partial GroupAggregate
Group Key: customer.c_custkey
-> Nested Loop Left Join
-> Parallel Index Only Scan using customer_pkey on customer
-> Index Scan using idx_orders_custkey on orders
Index Cond: (customer.c_custkey = o_custkey)
Filter: ((o_comment)::text !~~ '%special%deposits%'::text)
Akopọ naa waye ni ipele ti o kẹhin, nitorinaa Idarapọ Osi Nsted Loop jẹ iṣẹ ti o jọra. Atọka Ti o jọra Nikan Ṣiṣayẹwo ni a ṣe afihan nikan ni ẹya 10. O ṣiṣẹ iru si ṣiṣe ayẹwo ni tẹlentẹle. Ipo c_custkey = o_custkey Say ibere kan fun okun ose. Nitorina ko ṣe afiwe.
Hash Darapọ mọ
Ilana oṣiṣẹ kọọkan ṣẹda tabili hash tirẹ titi PostgreSQL 11. Ati pe ti o ba wa ju mẹrin ti awọn ilana wọnyi, iṣẹ kii yoo ni ilọsiwaju. Ninu ẹya tuntun, tabili hash ti pin. Ilana oṣiṣẹ kọọkan le lo WORK_MEM lati ṣẹda tabili hash kan.
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'AIR')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1996-01-01'
and l_receiptdate < date '1996-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode
LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1964755.66..1964961.44 rows=1 width=27) (actual time=7579.592..7922.997 rows=1 loops=1)
-> Finalize GroupAggregate (cost=1964755.66..1966196.11 rows=7 width=27) (actual time=7579.590..7579.591 rows=1 loops=1)
Group Key: lineitem.l_shipmode
-> Gather Merge (cost=1964755.66..1966195.83 rows=28 width=27) (actual time=7559.593..7922.319 rows=6 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=1963755.61..1965192.44 rows=7 width=27) (actual time=7548.103..7564.592 rows=2 loops=5)
Group Key: lineitem.l_shipmode
-> Sort (cost=1963755.61..1963935.20 rows=71838 width=27) (actual time=7530.280..7539.688 rows=62519 loops=5)
Sort Key: lineitem.l_shipmode
Sort Method: external merge Disk: 2304kB
Worker 0: Sort Method: external merge Disk: 2064kB
Worker 1: Sort Method: external merge Disk: 2384kB
Worker 2: Sort Method: external merge Disk: 2264kB
Worker 3: Sort Method: external merge Disk: 2336kB
-> Parallel Hash Join (cost=382571.01..1957960.99 rows=71838 width=27) (actual time=7036.917..7499.692 rows=62519 loops=5)
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1552386.40 rows=71838 width=19) (actual time=0.583..4901.063 rows=62519 loops=5)
Filter: ((l_shipmode = ANY ('{MAIL,AIR}'::bpchar[])) AND (l_commitdate < l_receiptdate) AND (l_shipdate < l_commitdate) AND (l_receiptdate >= '1996-01-01'::date) AND (l_receiptdate < '1997-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 11934691
-> Parallel Hash (cost=313722.45..313722.45 rows=3750045 width=20) (actual time=2011.518..2011.518 rows=3000000 loops=5)
Buckets: 65536 Batches: 256 Memory Usage: 3840kB
-> Parallel Seq Scan on orders (cost=0.00..313722.45 rows=3750045 width=20) (actual time=0.029..995.948 rows=3000000 loops=5)
Planning Time: 0.977 ms
Execution Time: 7923.770 ms
Ibeere 12 lati TPC-H ṣe afihan asopọ hash ti o jọra. Ilana oṣiṣẹ kọọkan ṣe alabapin si ṣiṣẹda tabili hash ti o wọpọ.
Darapọ Darapọ
A dapọ da ni ti kii-ni afiwe ninu iseda. Maṣe yọ ara rẹ lẹnu ti eyi ba jẹ igbesẹ ikẹhin ti ibeere naa - o tun le ṣiṣẹ ni afiwe.
-- Query 2 from TPC-H
explain (costs off) select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
from part, supplier, partsupp, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 36
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
and ps_supplycost = (
select
min(ps_supplycost)
from partsupp, supplier, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AMERICA'
)
order by s_acctbal desc, n_name, s_name, p_partkey
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Limit
-> Sort
Sort Key: supplier.s_acctbal DESC, nation.n_name, supplier.s_name, part.p_partkey
-> Merge Join
Merge Cond: (part.p_partkey = partsupp.ps_partkey)
Join Filter: (partsupp.ps_supplycost = (SubPlan 1))
-> Gather Merge
Workers Planned: 4
-> Parallel Index Scan using <strong>part_pkey</strong> on part
Filter: (((p_type)::text ~~ '%BRASS'::text) AND (p_size = 36))
-> Materialize
-> Sort
Sort Key: partsupp.ps_partkey
-> Nested Loop
-> Nested Loop
Join Filter: (nation.n_regionkey = region.r_regionkey)
-> Seq Scan on region
Filter: (r_name = 'AMERICA'::bpchar)
-> Hash Join
Hash Cond: (supplier.s_nationkey = nation.n_nationkey)
-> Seq Scan on supplier
-> Hash
-> Seq Scan on nation
-> Index Scan using idx_partsupp_suppkey on partsupp
Index Cond: (ps_suppkey = supplier.s_suppkey)
SubPlan 1
-> Aggregate
-> Nested Loop
Join Filter: (nation_1.n_regionkey = region_1.r_regionkey)
-> Seq Scan on region region_1
Filter: (r_name = 'AMERICA'::bpchar)
-> Nested Loop
-> Nested Loop
-> Index Scan using idx_partsupp_partkey on partsupp partsupp_1
Index Cond: (part.p_partkey = ps_partkey)
-> Index Scan using supplier_pkey on supplier supplier_1
Index Cond: (s_suppkey = partsupp_1.ps_suppkey)
-> Index Scan using nation_pkey on nation nation_1
Index Cond: (n_nationkey = supplier_1.s_nationkey)
Ipin “Idapọ Darapọ” wa ni oke “Idapọ Ijọpọ”. Nitorinaa iṣakojọpọ ko lo sisẹ deede. Ṣugbọn oju ipade “Atọka Atọka Ti o jọra” tun ṣe iranlọwọ pẹlu apakan naa part_pkey.
Asopọ nipasẹ awọn apakan
Ninu PostgreSQL 11 asopọ nipasẹ awọn apakan alaabo nipasẹ aiyipada: o ni ṣiṣe eto ti o gbowolori pupọ. Awọn tabili pẹlu ipin ti o jọra le darapọ mọ ipin nipasẹ ipin. Ni ọna yii Postgres yoo lo awọn tabili hash kekere. Kọọkan asopọ ti awọn apakan le jẹ ni afiwe.
tpch=# set enable_partitionwise_join=t;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
QUERY PLAN
---------------------------------------------------
Append
-> Hash Join
Hash Cond: (t2.b = t1.a)
-> Seq Scan on prt2_p1 t2
Filter: ((b >= 0) AND (b <= 10000))
-> Hash
-> Seq Scan on prt1_p1 t1
Filter: (b = 0)
-> Hash Join
Hash Cond: (t2_1.b = t1_1.a)
-> Seq Scan on prt2_p2 t2_1
Filter: ((b >= 0) AND (b <= 10000))
-> Hash
-> Seq Scan on prt1_p2 t1_1
Filter: (b = 0)
tpch=# set parallel_setup_cost = 1;
tpch=# set parallel_tuple_cost = 0.01;
tpch=# explain (costs off) select * from prt1 t1, prt2 t2
where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
QUERY PLAN
-----------------------------------------------------------
Gather
Workers Planned: 4
-> Parallel Append
-> Parallel Hash Join
Hash Cond: (t2_1.b = t1_1.a)
-> Parallel Seq Scan on prt2_p2 t2_1
Filter: ((b >= 0) AND (b <= 10000))
-> Parallel Hash
-> Parallel Seq Scan on prt1_p2 t1_1
Filter: (b = 0)
-> Parallel Hash Join
Hash Cond: (t2.b = t1.a)
-> Parallel Seq Scan on prt2_p1 t2
Filter: ((b >= 0) AND (b <= 10000))
-> Parallel Hash
-> Parallel Seq Scan on prt1_p1 t1
Filter: (b = 0)
Ohun akọkọ ni pe asopọ ni awọn apakan jẹ afiwera nikan ti awọn apakan wọnyi ba tobi to.
Parallel Append
Parallel Append le ṣee lo dipo ti o yatọ si awọn bulọọki ni orisirisi awọn workflows. Eyi maa n ṣẹlẹ pẹlu UNION GBOGBO awọn ibeere. Alailanfani jẹ kere si parallelism, nitori kọọkan Osise ilana nikan lakọkọ 1 ìbéèrè.
Awọn ilana oṣiṣẹ 2 nṣiṣẹ nibi, botilẹjẹpe 4 ṣiṣẹ.
tpch=# explain (costs off) select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '1998-12-01' - interval '105' day union all select sum(l_quantity) as sum_qty from lineitem where l_shipdate <= date '2000-12-01' - interval '105' day;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather
Workers Planned: 2
-> Parallel Append
-> Aggregate
-> Seq Scan on lineitem
Filter: (l_shipdate <= '2000-08-18 00:00:00'::timestamp without time zone)
-> Aggregate
-> Seq Scan on lineitem lineitem_1
Filter: (l_shipdate <= '1998-08-18 00:00:00'::timestamp without time zone)
Awọn oniyipada pataki julọ
WORK_MEM fi opin si iranti fun ilana, kii ṣe awọn ibeere nikan: work_mem awọn ilana awọn isopọ = a pupo ti iranti.
Gẹgẹ bi ti ikede 9.6, sisẹ ni afiwe le ṣe ilọsiwaju iṣẹ ṣiṣe ti awọn ibeere eka ti o ṣayẹwo ọpọlọpọ awọn ori ila tabi awọn atọka. Ni PostgreSQL 10, sisẹ ni afiwe ti ṣiṣẹ nipasẹ aiyipada. Ranti lati mu ṣiṣẹ lori awọn olupin pẹlu ẹru iṣẹ OLTP nla kan. Ṣiṣayẹwo lẹsẹsẹ tabi awọn ọlọjẹ atọka n gba ọpọlọpọ awọn orisun. Ti o ko ba nṣiṣẹ ijabọ lori gbogbo dataset, o le mu iṣẹ ṣiṣe ibeere pọ si nipa fifi awọn atọka sonu kun tabi lilo ipin to dara.