Forord eller hvordan ideen om sektionering opstod
Historien starter her:
Lyrisk digression:
Netop 'i det øjeblik', fordi
Så hvordan kan du ellers gøre kunden glad, og samtidig forbedre dine egne kompetencer?
For at forenkle alting så meget som muligt, så er der kun to måder at radikalt forbedre noget i databasens ydeevne:
1) Omfattende sti - vi øger ressourcer, ændrer konfigurationen;
2) Intensiv sti - forespørgselsoptimering
Da det, jeg gentager, på det tidspunkt ikke længere var klart, hvad der ellers skulle ændres i anmodningen om at fremskynde, blev vejen valgt - ændringer i borddesign.
Så det vigtigste spørgsmål opstår: hvad og hvordan vil vi ændre?
Oprindelige forhold
For det første er der denne ERD (vist på en betinget forenklet måde):
Hovedtræk:
- mange-til-mange forhold
- tabellen har allerede en potentiel partitionsnøgle
Oprindelig anmodning:
SELECT
p."PARAMETER_ID" as parameter_id,
pc."PC_NAME" AS pc_name,
pc."CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LASERMARK" AS lasermark,
w."LOTID" AS lotid,
w."REPORTED_VALUE" AS reported_value,
w."LOWER_SPEC_LIMIT" AS lower_spec_limit,
w."UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SHIPMENT_NAME" AS shipment_name,
s."SHIPMENT_DATE" AS shipment_date,
extract(year from s."SHIPMENT_DATE") AS year,
extract(month from s."SHIPMENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p."SPARAM_NAME" AS SPARAM_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM data w INNER JOIN shipment s ON s."SHIPMENT_ID" = w."SHIPMENT_ID"
INNER JOIN parameters p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN shipment_pc sp ON s."SHIPMENT_ID" = sp."SHIPMENT_ID"
INNER JOIN pc pc ON pc."PC_ID" = sp."PC_ID"
INNER JOIN ( SELECT w2."LASERMARK" , MAX(s2."SHIPMENT_DATE") AS "SHIPMENT_DATE"
FROM shipment s2 INNER JOIN data w2 ON s2."SHIPMENT_ID" = w2."SHIPMENT_ID"
GROUP BY w2."LASERMARK"
) md ON md."SHIPMENT_DATE" = s."SHIPMENT_DATE" AND md."LASERMARK" = w."LASERMARK"
WHERE
s."SHIPMENT_DATE" >= '2018-07-01' AND s."SHIPMENT_DATE" <= '2018-09-30' ;
Resultater af udførelse på testdatabasen:
Koste : 502 997.55
Udførelsestid: 505 sekunder.
Hvad ser vi? En regelmæssig anmodning, baseret på et tidsudsnit.
Lad os gøre den enkleste logiske antagelse: Hvis der er en prøve af et tidsudsnit, vil det så hjælpe os? Det er rigtigt - partitionering.
Hvad skal sektioneres?
Ved første øjekast er valget indlysende - deklarativ opdeling af "forsendelse"-tabellen ved hjælp af "SHIPMENT_DATE"-tasten (springer for langt frem - til sidst gik det lidt galt i produktionen).
Hvordan opdeles?
Dette spørgsmål er heller ikke for svært. Heldigvis er der i PostgreSQL 10 nu en menneskelig opdelingsmekanisme.
So:
- Gem et dump af kildetabellen - pg_dump source_table
- Slet den originale tabel - drop tabel source_table
- Opret en overordnet tabel med områdepartitionering - opret tabel source_table
- Opret sektioner - opret tabel kildetabel, opret indeks
- Importer dumpet oprettet i trin 1 - pg_gendan
Scripts til partitionering
For nemheds skyld og nemheds skyld er trin 2,3,4 blevet kombineret til ét script.
So:
Gem et dump af kildetabellen
pg_dump postgres --file=/dump/shipment.dmp --format=c --table=shipment --verbose > /dump/shipment.log 2>&1
Slet kildetabellen + Opret en overordnet tabel med områdepartitionering + Opret partitioner
--create_partition_shipment.sql
do language plpgsql $$
declare
rec_shipment_date RECORD ;
partition_name varchar;
index_name varchar;
current_year varchar ;
current_month varchar ;
begin_year varchar ;
begin_month varchar ;
next_year varchar ;
next_month varchar ;
first_flag boolean ;
i integer ;
begin
RAISE NOTICE 'CREATE TEMPORARY TABLE FOR SHIPMENT_DATE';
CREATE TEMP TABLE tmp_shipment_date as select distinct "SHIPMENT_DATE" from shipment order by "SHIPMENT_DATE" ;
RAISE NOTICE 'DROP TABLE shipment';
drop table shipment cascade ;
CREATE TABLE public.shipment
(
"SHIPMENT_ID" integer NOT NULL DEFAULT nextval('shipment_shipment_id_seq'::regclass),
"SHIPMENT_NAME" character varying(30) COLLATE pg_catalog."default",
"SHIPMENT_DATE" timestamp without time zone,
"REPORT_NAME" character varying(40) COLLATE pg_catalog."default"
)
PARTITION BY RANGE ("SHIPMENT_DATE")
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
RAISE NOTICE 'CREATE PARTITIONS FOR TABLE shipment';
current_year:='0';
current_month:='0';
begin_year := '0' ;
begin_month := '0' ;
next_year := '0' ;
next_month := '0' ;
FOR rec_shipment_date IN SELECT * FROM tmp_shipment_date LOOP
RAISE NOTICE 'SHIPMENT_DATE=%',rec_shipment_date."SHIPMENT_DATE";
current_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE");
current_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE") ;
IF to_number(current_month,'99') < 10 THEN
current_month := '0'||current_month ;
END IF ;
--Init borders
IF begin_year = '0' THEN
first_flag := true ; --first time flag
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
-- Check current date into borders NOT for First time
IF to_date( current_year||'.'||current_month, 'YYYY.MM') >= to_date( begin_year||'.'||begin_month, 'YYYY.MM') AND
to_date( current_year||'.'||current_month, 'YYYY.MM') < to_date( next_year||'.'||next_month, 'YYYY.MM') AND
NOT first_flag
THEN
CONTINUE ;
ELSE
--NEW borders only for second and after time
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
partition_name := 'shipment_shipment_date_'||begin_year||'-'||begin_month||'-01-'|| next_year||'-'||next_month||'-01' ;
EXECUTE format('CREATE TABLE ' || quote_ident(partition_name) || ' PARTITION OF shipment FOR VALUES FROM ( %L ) TO ( %L ) ' , current_year||'-'||current_month||'-01' , next_year||'-'||next_month||'-01' ) ;
index_name := partition_name||'_shipment_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID") TABLESPACE pg_default ' ) ;
--Drop first time flag
first_flag := false ;
END LOOP;
end
$$;
Import af lossepladsen
pg_restore -d postgres --data-only --format=c --table=shipment --verbose shipment.dmp > /tmp/data_dump/shipment_restore.log 2>&1
Kontrollerer partitioneringsresultaterne
Hvad har vi som resultat? Den fulde tekst af udførelsesplanen er stor og kedelig, så det er ganske muligt at begrænse sig til de endelige tal.
det var
Omkostninger: 502 997.55
Udførelsestid: 505 sekunder.
Er blevet
Omkostninger: 77 872.36
Udførelsestid: 79 sekunder.
Ganske godt resultat. Reducerede omkostninger og udførelsestid. Således giver brugen af partitionering den forventede effekt og generelt ingen overraskelser.
Gør kunden glad
Testresultaterne blev præsenteret for kunden til gennemgang. Og efter at have gennemgået det, fik de en noget uventet dom: "Godt, opdel "data"-tabellen."
Ja, men vi undersøgte en helt anden "forsendelse"-tabel; "data"-tabellen har ikke feltet "SHIPMENT_DATE".
Intet problem, tilføje, ændre. Det vigtigste er, at kunden er tilfreds med resultatet, implementeringsdetaljerne er ikke særligt vigtige.
Opdeling af hovedtabellen "data"
Generelt opstod der ingen særlige vanskeligheder. Selvom partitioneringsalgoritmen selvfølgelig har ændret sig noget.
Tilføjelse af kolonnen "SHIPMENT_DATA" til tabellen "data".
psql -h хост -U база -d юзер
=> ALTER TABLE data ADD COLUMN "SHIPMENT_DATE" timestamp without time zone ;
Udfyld værdierne i kolonnen "SHIPMENT_DATA" i tabellen "data" med værdierne for kolonnen med samme navn fra tabellen "shipment"
-----------------------------
--update_data.sql
--updating for altered table "data" to values of "shipment_data" from the table "shipment"
--version 1.0
do language plpgsql $$
declare
rec_shipment_data RECORD ;
shipment_date timestamp without time zone ;
row_count integer ;
total_rows integer ;
begin
select count(*) into total_rows from shipment ;
RAISE NOTICE 'Total %',total_rows;
row_count:= 0 ;
FOR rec_shipment_data IN SELECT * FROM shipment LOOP
update data set "SHIPMENT_DATE" = rec_shipment_data."SHIPMENT_DATE" where "SHIPMENT_ID" = rec_shipment_data."SHIPMENT_ID";
row_count:= row_count +1 ;
RAISE NOTICE 'row count = % , from %',row_count,total_rows;
END LOOP;
end
$$;
Gem et dump af "data"-tabellen
pg_dump postgres --file=/dump/data.dmp --format=c --table=data --verbose > /dump/data.log 2>&1</source
Genskab den opdelte tabel "data"
--create_partition_data.sql
--create partitions for the table "wafer data" by range column "shipment_data" with one month duration
--version 1.0
do language plpgsql $$
declare
rec_shipment_date RECORD ;
partition_name varchar;
index_name varchar;
current_year varchar ;
current_month varchar ;
begin_year varchar ;
begin_month varchar ;
next_year varchar ;
next_month varchar ;
first_flag boolean ;
i integer ;
begin
RAISE NOTICE 'CREATE TEMPORARY TABLE FOR SHIPMENT_DATE';
CREATE TEMP TABLE tmp_shipment_date as select distinct "SHIPMENT_DATE" from shipment order by "SHIPMENT_DATE" ;
RAISE NOTICE 'DROP TABLE data';
drop table data cascade ;
RAISE NOTICE 'CREATE PARTITIONED TABLE data';
CREATE TABLE public.data
(
"RUN_ID" integer,
"LASERMARK" character varying(20) COLLATE pg_catalog."default" NOT NULL,
"LOTID" character varying(80) COLLATE pg_catalog."default",
"SHIPMENT_ID" integer NOT NULL,
"PARAMETER_ID" integer NOT NULL,
"INTERNAL_VALUE" character varying(75) COLLATE pg_catalog."default",
"REPORTED_VALUE" character varying(75) COLLATE pg_catalog."default",
"LOWER_SPEC_LIMIT" numeric,
"UPPER_SPEC_LIMIT" numeric ,
"SHIPMENT_DATE" timestamp without time zone
)
PARTITION BY RANGE ("SHIPMENT_DATE")
WITH (
OIDS = FALSE
)
TABLESPACE pg_default ;
RAISE NOTICE 'CREATE PARTITIONS FOR TABLE data';
current_year:='0';
current_month:='0';
begin_year := '0' ;
begin_month := '0' ;
next_year := '0' ;
next_month := '0' ;
i := 1;
FOR rec_shipment_date IN SELECT * FROM tmp_shipment_date LOOP
RAISE NOTICE 'SHIPMENT_DATE=%',rec_shipment_date."SHIPMENT_DATE";
current_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE");
current_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE") ;
--Init borders
IF begin_year = '0' THEN
RAISE NOTICE '***Init borders';
first_flag := true ; --first time flag
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
-- RAISE NOTICE 'current_year=% , current_month=% ',current_year,current_month;
-- RAISE NOTICE 'begin_year=% , begin_month=% ',begin_year,begin_month;
-- RAISE NOTICE 'next_year=% , next_month=% ',next_year,next_month;
-- Check current date into borders NOT for First time
RAISE NOTICE 'Current data = %',to_char( to_date( current_year||'.'||current_month, 'YYYY.MM'), 'YYYY.MM');
RAISE NOTICE 'Begin data = %',to_char( to_date( begin_year||'.'||begin_month, 'YYYY.MM'), 'YYYY.MM');
RAISE NOTICE 'Next data = %',to_char( to_date( next_year||'.'||next_month, 'YYYY.MM'), 'YYYY.MM');
IF to_date( current_year||'.'||current_month, 'YYYY.MM') >= to_date( begin_year||'.'||begin_month, 'YYYY.MM') AND
to_date( current_year||'.'||current_month, 'YYYY.MM') < to_date( next_year||'.'||next_month, 'YYYY.MM') AND
NOT first_flag
THEN
RAISE NOTICE '***CONTINUE';
CONTINUE ;
ELSE
--NEW borders only for second and after time
RAISE NOTICE '***NEW BORDERS';
begin_year := current_year ;
begin_month := current_month ;
IF current_month = '12' THEN
next_year := date_part('year' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 year') ;
ELSE
next_year := current_year ;
END IF;
next_month := date_part('month' ,rec_shipment_date."SHIPMENT_DATE" + interval '1 month') ;
END IF;
IF to_number(current_month,'99') < 10 THEN
current_month := '0'||current_month ;
END IF ;
IF to_number(begin_month,'99') < 10 THEN
begin_month := '0'||begin_month ;
END IF ;
IF to_number(next_month,'99') < 10 THEN
next_month := '0'||next_month ;
END IF ;
RAISE NOTICE 'current_year=% , current_month=% ',current_year,current_month;
RAISE NOTICE 'begin_year=% , begin_month=% ',begin_year,begin_month;
RAISE NOTICE 'next_year=% , next_month=% ',next_year,next_month;
partition_name := 'data_'||begin_year||begin_month||'01_'||next_year||next_month||'01' ;
RAISE NOTICE 'PARTITION NUMBER % , TABLE NAME =%',i , partition_name;
EXECUTE format('CREATE TABLE ' || quote_ident(partition_name) || ' PARTITION OF data FOR VALUES FROM ( %L ) TO ( %L ) ' , begin_year||'-'||begin_month||'-01' , next_year||'-'||next_month||'-01' ) ;
index_name := partition_name||'_shipment_id_parameter_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID", "PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_lasermark_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("LASERMARK" COLLATE pg_catalog."default") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_parameter_id_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_date_idx';
RAISE NOTICE 'INDEX NAME =%',index_name;
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_DATE") TABLESPACE pg_default ' ) ;
--Drop first time flag
first_flag := false ;
END LOOP;
end
$$;
Indlæs dumpen oprettet i trin 3.
pg_restore -h хост -юзер -d база --data-only --format=c --table=data --verbose data.dmp > data_restore.log 2>&1
Opret en separat sektion til gamle data
---------------------------------------------------
--create_partition_for_old_dates.sql
--create partitions for keeping old dates
--version 1.0
do language plpgsql $$
declare
rec_shipment_date RECORD ;
partition_name varchar;
index_name varchar;
begin
SELECT min("SHIPMENT_DATE") AS min_date INTO rec_shipment_date from data ;
RAISE NOTICE 'Old date is %',rec_shipment_date.min_date ;
partition_name := 'data_old_dates' ;
RAISE NOTICE 'PARTITION NAME IS %',partition_name;
EXECUTE format('CREATE TABLE ' || quote_ident(partition_name) || ' PARTITION OF data FOR VALUES FROM ( %L ) TO ( %L ) ' , '1900-01-01' ,
to_char( rec_shipment_date.min_date,'YYYY')||'-'||to_char(rec_shipment_date.min_date,'MM')||'-01' ) ;
index_name := partition_name||'_shipment_id_parameter_id_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID", "PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_lasermark_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("LASERMARK" COLLATE pg_catalog."default") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_id_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_parameter_id_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("PARAMETER_ID") TABLESPACE pg_default ' ) ;
index_name := partition_name||'_shipment_date_idx';
EXECUTE format('CREATE INDEX ' || quote_ident(index_name) || ' ON '|| quote_ident(partition_name) ||' USING btree ("SHIPMENT_DATE") TABLESPACE pg_default ' ) ;
end
$$;
Endelige resultater:
det var
Omkostninger: 502 997.55
Udførelsestid: 505 sekunder.
Er blevet
Omkostninger: 68 533.70
Udførelsestid: 69 sekunder
Værdig, ganske værdig. Og taget i betragtning, at det undervejs lykkedes os mere eller mindre at mestre partitioneringsmekanismen i PostgreSQL 10 - et fremragende resultat.
Lyrisk digression
Er det muligt at gøre det endnu bedre - JA, DU KAN!For at gøre dette skal du bruge MATERIALISERET VISNING.
OPRET MATERIALISERET VISNING LASERMARK_VIEW
CREATE MATERIALIZED VIEW LASERMARK_VIEW
AS
SELECT w."LASERMARK" , MAX(s."SHIPMENT_DATE") AS "SHIPMENT_DATE"
FROM shipment s INNER JOIN data w ON s."SHIPMENT_ID" = w."SHIPMENT_ID"
GROUP BY w."LASERMARK" ;
CREATE INDEX lasermark_vw_shipment_date_ind on lasermark_view USING btree ("SHIPMENT_DATE") TABLESPACE pg_default;
analyze lasermark_view ;
Endnu en gang omskriver vi anmodningen:
Forespørg ved hjælp af materialiseret visning
SELECT
p."PARAMETER_ID" as parameter_id,
pc."PC_NAME" AS pc_name,
pc."CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LASERMARK" AS lasermark,
w."LOTID" AS lotid,
w."REPORTED_VALUE" AS reported_value,
w."LOWER_SPEC_LIMIT" AS lower_spec_limit,
w."UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SHIPMENT_NAME" AS shipment_name,
s."SHIPMENT_DATE" AS shipment_date,
extract(year from s."SHIPMENT_DATE") AS year,
extract(month from s."SHIPMENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p."STC_NAME" AS STC_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM data w INNER JOIN shipment s ON s."SHIPMENT_ID" = w."SHIPMENT_ID"
INNER JOIN parameters p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN shipment_pc sp ON s."SHIPMENT_ID" = sp."SHIPMENT_ID"
INNER JOIN pc pc ON pc."PC_ID" = sp."PC_ID"
INNER JOIN LASERMARK_VIEW md ON md."SHIPMENT_DATE" = s."SHIPMENT_DATE" AND md."LASERMARK" = w."LASERMARK"
WHERE
s."SHIPMENT_DATE" >= '2018-07-01' AND s."SHIPMENT_DATE" <= '2018-09-30';
Og vi får endnu et resultat:
det var
Omkostninger: 502 997.55
Udførelsestid: 505 sekunder
Er blevet
Omkostninger: 42 481.16
Udførelsestid: 43 sekunder.
Selvom et så lovende resultat selvfølgelig er vildledende, skal ideer genopfriskes. Så den samlede tid til at modtage data vil ikke hjælpe meget. Men som et eksperiment er det ret interessant.
Faktisk, som det viste sig, tak igen
efterskrift
Så kunden er tilfreds. OG brug udnytte situationen.
Ny opgave: Hvad kan du finde på for at uddybe og udvide?
Og så husker jeg - gutter, vi har ikke overvågning af vores PostgreSQL-databaser.
Hånden på hjertet er der stadig en del overvågning i form af Cloud Watch på AWS. Men hvad er fordelen ved denne overvågning for DBA? Generelt praktisk talt ingen.
Hvis du har en chance for at gøre noget nyttigt og interessant for dig selv, kan du ikke lade være med at udnytte denne chance...
TIL
Sådan kommer vi til den mest interessante del:
3. december 2018.
At tage en beslutning om at påbegynde forskning i tilgængelige muligheder for at overvåge ydeevnen af PostgreSQL-forespørgsler.
Men det er en helt anden historie.
Fortsættes…
Kilde: www.habr.com