Eessõna ehk kuidas tekkis lõikumise idee
Lugu algab siit:
Lüüriline kõrvalekalle:
Just 'sel hetkel', sest
Niisiis, kuidas muidu saate klienti õnnelikuks teha ja samal ajal oma oskusi parandada?
Et kõike võimalikult palju lihtsustada, siis on andmebaasi jõudluse radikaalseks parandamiseks ainult kaks võimalust:
1) Ulatuslik tee - suurendame ressursse, muudame konfiguratsiooni;
2) Intensiivne tee – päringu optimeerimine
Kuna, kordan, polnud sel ajal enam selge, mida kiirendamise taotluses veel muuta, siis sai valitud tee - laua kujunduse muudatused.
Seega tekib põhiküsimus: mida ja kuidas me muudame?
Esialgsed tingimused
Esiteks on see ERD (näidatud tinglikult lihtsustatud kujul):
Põhijooned:
- paljud-mitmele suhted
- tabelis on juba potentsiaalne partitsioonivõti
Algne taotlus:
SELECT
p."PARAMETER_ID" as parameter_id,
pc."PC_NAME" AS pc_name,
pc."CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LASERMARK" AS lasermark,
w."LOTID" AS lotid,
w."REPORTED_VALUE" AS reported_value,
w."LOWER_SPEC_LIMIT" AS lower_spec_limit,
w."UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SHIPMENT_NAME" AS shipment_name,
s."SHIPMENT_DATE" AS shipment_date,
extract(year from s."SHIPMENT_DATE") AS year,
extract(month from s."SHIPMENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p."SPARAM_NAME" AS SPARAM_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM data w INNER JOIN shipment s ON s."SHIPMENT_ID" = w."SHIPMENT_ID"
INNER JOIN parameters p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN shipment_pc sp ON s."SHIPMENT_ID" = sp."SHIPMENT_ID"
INNER JOIN pc pc ON pc."PC_ID" = sp."PC_ID"
INNER JOIN ( SELECT w2."LASERMARK" , MAX(s2."SHIPMENT_DATE") AS "SHIPMENT_DATE"
FROM shipment s2 INNER JOIN data w2 ON s2."SHIPMENT_ID" = w2."SHIPMENT_ID"
GROUP BY w2."LASERMARK"
) md ON md."SHIPMENT_DATE" = s."SHIPMENT_DATE" AND md."LASERMARK" = w."LASERMARK"
WHERE
s."SHIPMENT_DATE" >= '2018-07-01' AND s."SHIPMENT_DATE" <= '2018-09-30' ;
Testide andmebaasis täitmise tulemused:
Maksma : 502 997.55
Täitmise aeg: 505 sekundit.
Mida me näeme? Tavaline taotlus, mis põhineb ajalõikel.
Teeme lihtsaima loogilise oletuse: kui on olemas ajalõike näidis, kas see aitab meid? Täpselt nii – jaotamine.
Mida jagada?
Esmapilgul on valik ilmne - tabeli “saadetis” deklaratiivne jaotamine klahvi “SHIPMENT_DATE” abil (liiga ette hüppamine - lõpuks tuli tootmises veidi valesti välja).
Kuidas osadeks jagada?
See küsimus pole ka liiga raske. Õnneks on PostgreSQL 10-s nüüd olemas inimese partitsioonimehhanism.
Nii et:
- Salvestage lähtetabeli tõmmis - pg_dump lähtetabel
- Kustuta algne tabel - kukuta tabel allika_tabel
- Looge ülemtabel vahemiku jaotusega - loo tabel allika_tabel
- Loo jaotised - loo tabel allika_tabel, loo indeks
- Importige 1. sammus loodud tõmmis - pg_restore
Skriptid partitsioonideks
Lihtsuse ja mugavuse huvides on sammud 2,3,4 ühendatud üheks skriptiks.
Nii et:
Salvestage lähtetabeli tõmmis
pg_dump postgres --file=/dump/shipment.dmp --format=c --table=shipment --verbose > /dump/shipment.log 2>&1
Lähtetabeli kustutamine + vahemiku partitsioonidega ülemtabeli loomine + partitsioonide loomine
--create_partition_shipment.sql
do language plpgsql $$
declare
rec_shipment_date RECORD ;
partition_name varchar;
index_name varchar;
current_year varchar ;
current_month varchar ;
begin_year varchar ;
begin_month varchar ;
next_year varchar ;
next_month varchar ;
first_flag boolean ;
i integer ;
begin
RAISE NOTICE 'CREATE TEMPORARY TABLE FOR SHIPMENT_DATE';
CREATE TEMP TABLE tmp_shipment_date as select distinct "SHIPMENT_DATE" from shipment order by "SHIPMENT_DATE" ;
RAISE NOTICE 'DROP TABLE shipment';
drop table shipment cascade ;
CREATE TABLE public.shipment
(
"SHIPMENT_ID" integer NOT NULL DEFAULT nextval('shipment_shipment_id_seq'::regclass),
"SHIPMENT_NAME" character varying(30) COLLATE pg_catalog."default",
"SHIPMENT_DATE" timestamp without time zone,
"REPORT_NAME" character varying(40) COLLATE pg_catalog."default"
)
PARTITION BY RANGE ("SHIPMENT_DATE")
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
RAISE NOTICE 'CREATE PARTITIONS FOR TABLE shipment';
current_year:='0';
current_month:='0';
begin_year := '0' ;
begin_month := '0' ;
next_year := '0' ;
next_month := '0' ;
FOR rec_shipment_date IN SELECT * FROM tmp_shipment_date LOOP
RAISE NOTICE 'SHIPMENT_DATE=%',rec_shipment_date."SHIPMENT_DATE";
current_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE");
current_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE") ;
IF to_number(current_month,'99') < 10 THEN
current_month := '0'||current_month ;
END IF ;
--Init borders
IF begin_year = '0' THEN
first_flag := true ; --first time flag
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
-- Check current date into borders NOT for First time
IF to_date( current_year||'.'||current_month, 'YYYY.MM') >= to_date( begin_year||'.'||begin_month, 'YYYY.MM') AND
to_date( current_year||'.'||current_month, 'YYYY.MM') < to_date( next_year||'.'||next_month, 'YYYY.MM') AND
NOT first_flag
THEN
CONTINUE ;
ELSE
--NEW borders only for second and after time
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
partition_name := 'shipment_shipment_date_'||begin_year||'-'||begin_month||'-01-'|| next_year||'-'||next_month||'-01' ;
EXECUTE format('CREATE TABLE ' || quote_ident(partition_name) || ' PARTITION OF shipment FOR VALUES FROM ( %L ) TO ( %L ) ' , current_year||'-'||current_month||'-01' , next_year||'-'||next_month||'-01' ) ;
index_name := partition_name||'_shipment_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID") TABLESPACE pg_default ' ) ;
--Drop first time flag
first_flag := false ;
END LOOP;
end
$$;
Prügila importimine
pg_restore -d postgres --data-only --format=c --table=shipment --verbose shipment.dmp > /tmp/data_dump/shipment_restore.log 2>&1
Jaotamise tulemuste kontrollimine
Mis meil selle tulemusena on? Täitmisplaani täistekst on suur ja igav, seega on täiesti võimalik piirduda lõplike numbritega.
Oli
Hind: 502 997.55
Täitmise aeg: 505 sekundit.
On saanud
Hind: 77 872.36
Täitmise aeg: 79 sekundit.
Päris hea tulemus. Vähendatud kulud ja teostusaeg. Seega annab partitsiooni kasutamine oodatud efekti ja üldiselt ei üllata.
Tee klient õnnelikuks
Testi tulemused esitati kliendile ülevaatamiseks. Ja pärast selle ülevaatamist tehti neile mõnevõrra ootamatu otsus: "Tore, jagage "andmete" tabel.
Jah, aga uurisime täiesti teistsugust “saadetise” tabelit; tabelis “andmed” ei ole välja “SHIPMENT_DATE”.
Pole probleemi, lisage, muutke. Peaasi, et klient on tulemusega rahul, teostuse üksikasjad pole eriti olulised.
Põhitabeli "andmed" jagamine
Üldiselt erilisi raskusi ei tekkinud. Kuigi jaotamisalgoritm on muidugi mõnevõrra muutunud.
Veeru „SHIPMENT_DATA” lisamine tabelisse „andmed”.
psql -h хост -U база -d юзер
=> ALTER TABLE data ADD COLUMN "SHIPMENT_DATE" timestamp without time zone ;
Täitke tabelis "andmed" veeru "SHIPMENT_DATA" väärtused tabelis "saadetis" oleva samanimelise veeru väärtustega.
-----------------------------
--update_data.sql
--updating for altered table "data" to values of "shipment_data" from the table "shipment"
--version 1.0
do language plpgsql $$
declare
rec_shipment_data RECORD ;
shipment_date timestamp without time zone ;
row_count integer ;
total_rows integer ;
begin
select count(*) into total_rows from shipment ;
RAISE NOTICE 'Total %',total_rows;
row_count:= 0 ;
FOR rec_shipment_data IN SELECT * FROM shipment LOOP
update data set "SHIPMENT_DATE" = rec_shipment_data."SHIPMENT_DATE" where "SHIPMENT_ID" = rec_shipment_data."SHIPMENT_ID";
row_count:= row_count +1 ;
RAISE NOTICE 'row count = % , from %',row_count,total_rows;
END LOOP;
end
$$;
Salvestage tabeli "andmed" väljavõte
pg_dump postgres --file=/dump/data.dmp --format=c --table=data --verbose > /dump/data.log 2>&1</source
Looge uuesti partitsioonitud tabel "andmed"
--create_partition_data.sql
--create partitions for the table "wafer data" by range column "shipment_data" with one month duration
--version 1.0
do language plpgsql $$
declare
rec_shipment_date RECORD ;
partition_name varchar;
index_name varchar;
current_year varchar ;
current_month varchar ;
begin_year varchar ;
begin_month varchar ;
next_year varchar ;
next_month varchar ;
first_flag boolean ;
i integer ;
begin
RAISE NOTICE 'CREATE TEMPORARY TABLE FOR SHIPMENT_DATE';
CREATE TEMP TABLE tmp_shipment_date as select distinct "SHIPMENT_DATE" from shipment order by "SHIPMENT_DATE" ;
RAISE NOTICE 'DROP TABLE data';
drop table data cascade ;
RAISE NOTICE 'CREATE PARTITIONED TABLE data';
CREATE TABLE public.data
(
"RUN_ID" integer,
"LASERMARK" character varying(20) COLLATE pg_catalog."default" NOT NULL,
"LOTID" character varying(80) COLLATE pg_catalog."default",
"SHIPMENT_ID" integer NOT NULL,
"PARAMETER_ID" integer NOT NULL,
"INTERNAL_VALUE" character varying(75) COLLATE pg_catalog."default",
"REPORTED_VALUE" character varying(75) COLLATE pg_catalog."default",
"LOWER_SPEC_LIMIT" numeric,
"UPPER_SPEC_LIMIT" numeric ,
"SHIPMENT_DATE" timestamp without time zone
)
PARTITION BY RANGE ("SHIPMENT_DATE")
WITH (
OIDS = FALSE
)
TABLESPACE pg_default ;
RAISE NOTICE 'CREATE PARTITIONS FOR TABLE data';
current_year:='0';
current_month:='0';
begin_year := '0' ;
begin_month := '0' ;
next_year := '0' ;
next_month := '0' ;
i := 1;
FOR rec_shipment_date IN SELECT * FROM tmp_shipment_date LOOP
RAISE NOTICE 'SHIPMENT_DATE=%',rec_shipment_date."SHIPMENT_DATE";
current_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE");
current_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE") ;
--Init borders
IF begin_year = '0' THEN
RAISE NOTICE '***Init borders';
first_flag := true ; --first time flag
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
-- RAISE NOTICE 'current_year=% , current_month=% ',current_year,current_month;
-- RAISE NOTICE 'begin_year=% , begin_month=% ',begin_year,begin_month;
-- RAISE NOTICE 'next_year=% , next_month=% ',next_year,next_month;
-- Check current date into borders NOT for First time
RAISE NOTICE 'Current data = %',to_char( to_date( current_year||'.'||current_month, 'YYYY.MM'), 'YYYY.MM');
RAISE NOTICE 'Begin data = %',to_char( to_date( begin_year||'.'||begin_month, 'YYYY.MM'), 'YYYY.MM');
RAISE NOTICE 'Next data = %',to_char( to_date( next_year||'.'||next_month, 'YYYY.MM'), 'YYYY.MM');
IF to_date( current_year||'.'||current_month, 'YYYY.MM') >= to_date( begin_year||'.'||begin_month, 'YYYY.MM') AND
to_date( current_year||'.'||current_month, 'YYYY.MM') < to_date( next_year||'.'||next_month, 'YYYY.MM') AND
NOT first_flag
THEN
RAISE NOTICE '***CONTINUE';
CONTINUE ;
ELSE
--NEW borders only for second and after time
RAISE NOTICE '***NEW BORDERS';
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
IF to_number(current_month,'99') < 10 THEN
current_month := '0'||current_month ;
END IF ;
IF to_number(begin_month,'99') < 10 THEN
begin_month := '0'||begin_month ;
END IF ;
IF to_number(next_month,'99') < 10 THEN
next_month := '0'||next_month ;
END IF ;
RAISE NOTICE 'current_year=% , current_month=% ',current_year,current_month;
RAISE NOTICE 'begin_year=% , begin_month=% ',begin_year,begin_month;
RAISE NOTICE 'next_year=% , next_month=% ',next_year,next_month;
partition_name := 'data_'||begin_year||begin_month||'01_'||next_year||next_month||'01' ;
RAISE NOTICE 'PARTITION NUMBER % , TABLE NAME =%',i , partition_name;
EXECUTE format('CREATE TABLE ' || quote_ident(partition_name) || ' PARTITION OF data FOR VALUES FROM ( %L ) TO ( %L ) ' , begin_year||'-'||begin_month||'-01' , next_year||'-'||next_month||'-01' ) ;
index_name := partition_name||'_shipment_id_parameter_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID", "PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_lasermark_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("LASERMARK" COLLATE pg_catalog."default") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_parameter_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_date_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_DATE") TABLESPACE pg_default ' ) ;
--Drop first time flag
first_flag := false ;
END LOOP;
end
$$;
Laadige 3. sammus loodud prügila.
pg_restore -h хост -юзер -d база --data-only --format=c --table=data --verbose data.dmp > data_restore.log 2>&1
Looge vanade andmete jaoks eraldi jaotis
---------------------------------------------------
--create_partition_for_old_dates.sql
--create partitions for keeping old dates
--version 1.0
do language plpgsql $$
declare
rec_shipment_date RECORD ;
partition_name varchar;
index_name varchar;
begin
SELECT min("SHIPMENT_DATE") AS min_date INTO rec_shipment_date from data ;
RAISE NOTICE 'Old date is %',rec_shipment_date.min_date ;
partition_name := 'data_old_dates' ;
RAISE NOTICE 'PARTITION NAME IS %',partition_name;
EXECUTE format('CREATE TABLE ' || quote_ident(partition_name) || ' PARTITION OF data FOR VALUES FROM ( %L ) TO ( %L ) ' , '1900-01-01' ,
to_char( rec_shipment_date.min_date,'YYYY')||'-'||to_char(rec_shipment_date.min_date,'MM')||'-01' ) ;
index_name := partition_name||'_shipment_id_parameter_id_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID", "PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_lasermark_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("LASERMARK" COLLATE pg_catalog."default") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_id_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_parameter_id_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_date_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_DATE") TABLESPACE pg_default ' ) ;
end
$$;
Lõplikud tulemused:
Oli
Hind: 502 997.55
Täitmise aeg: 505 sekundit.
On saanud
Hind: 68 533.70
Täitmise aeg: 69 sekundit
Väärt, täitsa väärt. Ja kui arvestada, et meil õnnestus PostgreSQL 10 partitsioonimehhanismi enam-vähem omandada - suurepärane tulemus.
Lüüriline kõrvalepõige
Kas on võimalik veel paremini teha – JAH, SAAD!Selleks peate kasutama MATERIALIZED VIEW.
LOO MATERIALISEERITUD VAADE LASERMARK_VIEW
CREATE MATERIALIZED VIEW LASERMARK_VIEW
AS
SELECT w."LASERMARK" , MAX(s."SHIPMENT_DATE") AS "SHIPMENT_DATE"
FROM shipment s INNER JOIN data w ON s."SHIPMENT_ID" = w."SHIPMENT_ID"
GROUP BY w."LASERMARK" ;
CREATE INDEX lasermark_vw_shipment_date_ind on lasermark_view USING btree ("SHIPMENT_DATE") TABLESPACE pg_default;
analyze lasermark_view ;
Kirjutame taotluse veel kord ümber:
Päring materialiseeritud vaate abil
SELECT
p."PARAMETER_ID" as parameter_id,
pc."PC_NAME" AS pc_name,
pc."CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LASERMARK" AS lasermark,
w."LOTID" AS lotid,
w."REPORTED_VALUE" AS reported_value,
w."LOWER_SPEC_LIMIT" AS lower_spec_limit,
w."UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SHIPMENT_NAME" AS shipment_name,
s."SHIPMENT_DATE" AS shipment_date,
extract(year from s."SHIPMENT_DATE") AS year,
extract(month from s."SHIPMENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p."STC_NAME" AS STC_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM data w INNER JOIN shipment s ON s."SHIPMENT_ID" = w."SHIPMENT_ID"
INNER JOIN parameters p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN shipment_pc sp ON s."SHIPMENT_ID" = sp."SHIPMENT_ID"
INNER JOIN pc pc ON pc."PC_ID" = sp."PC_ID"
INNER JOIN LASERMARK_VIEW md ON md."SHIPMENT_DATE" = s."SHIPMENT_DATE" AND md."LASERMARK" = w."LASERMARK"
WHERE
s."SHIPMENT_DATE" >= '2018-07-01' AND s."SHIPMENT_DATE" <= '2018-09-30';
Ja saame teise tulemuse:
Oli
Hind: 502 997.55
Täitmise aeg: 505 sekundit
On saanud
Hind: 42 481.16
Täitmise aeg: 43 sekundit.
Kuigi nii paljulubav tulemus on muidugi petlik, ideid tuleb värskendada. Seega pole andmete kogumise ajast palju abi. Aga eksperimendina on see päris huvitav.
Tegelikult, nagu selgus, tänan veel kord
järelsõna
Seega, klient on rahul. JA kuni olukorda ära kasutama.
Uus ülesanne: Mida saate süvendamiseks ja laiendamiseks välja mõelda?
Ja siis ma mäletan – poisid, meil pole oma PostgreSQL-i andmebaaside jälgimist.
Käsi südamel, AWS-is on endiselt jälgitav Cloud Watch. Kuid mis kasu on sellest jälgimisest DBA jaoks? Üldiselt praktiliselt mitte ühtegi.
Kui teil on võimalus teha enda jaoks midagi kasulikku ja huvitavat, ei saa te seda võimalust kasutamata jätta...
FOR
Nii jõuame kõige huvitavama osani:
3. detsember 2018.
Otsuse tegemine alustada uurimist PostgreSQL-i päringute toimivuse jälgimiseks saadaolevate võimaluste kohta.
Aga see on hoopis teine lugu.
Jätkub…
Allikas: www.habr.com