ProHoster > Blog > Pangangasiwa > Isang pagtatangka na lumikha ng isang ASH analogue para sa PostgreSQL
Isang pagtatangka na lumikha ng isang ASH analogue para sa PostgreSQL
Pahayag ng problema
Upang ma-optimize ang mga query sa PostgreSQL, ang kakayahang suriin ang kasaysayan ng aktibidad, sa partikular, mga paghihintay, mga lock, at mga istatistika ng talahanayan, ay lubhang kailangan.
extension ng pgsentinel :
Β«Ang lahat ng naipon na impormasyon ay nakaimbak lamang sa RAM, at ang natupok na halaga ng memorya ay kinokontrol ng bilang ng mga huling nakaimbak na tala.
Idinagdag ang field ng queryid - ang parehong queryid mula sa extension ng pg_stat_statements (kinakailangan ang pre-installation).Β«
Ito, siyempre, ay makakatulong nang malaki, ngunit ang pinakamahirap na bagay ay ang unang punto."Ang lahat ng naipon na impormasyon ay nakaimbak lamang sa RAM β, ibig sabihin. may epekto sa target na base. Bilang karagdagan, walang kasaysayan ng lock at mga istatistika ng talahanayan. Yung. ang solusyon ay karaniwang hindi kumpleto: "Wala pang handa na pakete para sa pag-install. Iminumungkahi na i-download ang mga mapagkukunan at i-assemble ang library nang mag-isa. Kailangan mo munang i-install ang package na βdevelβ para sa iyong server at itakda ang path sa pg_config sa PATH variable.".
Sa pangkalahatan, maraming kaguluhan, at sa kaso ng mga seryosong database ng produksyon, maaaring hindi posible na gumawa ng anuman sa server. Kailangan nating makabuo muli ng sarili nating bagay.
Babala
Dahil sa medyo malaking volume at dahil sa hindi kumpletong panahon ng pagsubok, ang artikulo ay higit sa lahat ay may likas na impormasyon, sa halip bilang isang hanay ng mga thesis at mga intermediate na resulta.
Ang mas detalyadong materyal ay ihahanda sa ibang pagkakataon, sa mga bahagi
Mga kinakailangan sa draft para sa solusyon
Ito ay kinakailangan upang bumuo ng isang tool na nagbibigay-daan sa iyo upang mag-imbak:
pg_stat_activity view history History ng session lock gamit ang pg_locks view
Kinakailangan ng solusyon-bawasan ang epekto sa target na database.
Pangkalahatang ideyaβ ang ahente ng pangongolekta ng data ay inilunsad hindi sa target na database, ngunit sa database ng pagsubaybay bilang isang serbisyo ng systemd. Oo, posible ang ilang pagkawala ng data, ngunit hindi ito kritikal para sa pag-uulat, ngunit walang epekto sa target na database sa mga tuntunin ng memorya at espasyo sa disk. At sa kaso ng paggamit ng isang koneksyon pool, ang epekto sa mga proseso ng user ay minimal.
Mga yugto ng pagpapatupad
1.Mga talahanayan ng serbisyo
Ang isang hiwalay na schema ay ginagamit upang mag-imbak ng mga talahanayan, upang hindi kumplikado ang pagsusuri ng mga pangunahing talahanayan na ginamit.
DROP SCHEMA IF EXISTS activity_hist ;
CREATE SCHEMA activity_hist AUTHORIZATION monitor ;
Mahalaga: Ang schema ay hindi nilikha sa target na database, ngunit sa database ng pagsubaybay.
pg_stat_activity view history
Ang isang talahanayan ay ginagamit upang mag-imbak ng mga kasalukuyang snapshot ng pg_stat_activity view
activity_hist.history_pg_stat_activity :
--ACTIVITY_HIST.HISTORY_PG_STAT_ACTIVITY
DROP TABLE IF EXISTS activity_hist.history_pg_stat_activity;
CREATE TABLE activity_hist.history_pg_stat_activity
(
timepoint timestamp without time zone ,
datid oid ,
datname name ,
pid integer,
usesysid oid ,
usename name ,
application_name text ,
client_addr inet ,
client_hostname text ,
client_port integer,
backend_start timestamp with time zone ,
xact_start timestamp with time zone ,
query_start timestamp with time zone ,
state_change timestamp with time zone ,
wait_event_type text ,
wait_event text ,
state text ,
backend_xid xid ,
backend_xmin xid ,
query text ,
backend_type text ,
queryid bigint
);
Para mapabilis ang pagpasok - walang mga index o paghihigpit.
Upang iimbak ang kasaysayan mismo, ginagamit ang isang partitioned table:
activity_hist.archive_pg_stat_activity :
DROP TABLE IF EXISTS activity_hist.archive_pg_stat_activity;
CREATE TABLE activity_hist.archive_pg_stat_activity
(
timepoint timestamp without time zone ,
datid oid ,
datname name ,
pid integer,
usesysid oid ,
usename name ,
application_name text ,
client_addr inet ,
client_hostname text ,
client_port integer,
backend_start timestamp with time zone ,
xact_start timestamp with time zone ,
query_start timestamp with time zone ,
state_change timestamp with time zone ,
wait_event_type text ,
wait_event text ,
state text ,
backend_xid xid ,
backend_xmin xid ,
query text ,
backend_type text ,
queryid bigint
)
PARTITION BY RANGE (timepoint);
Dahil sa kasong ito walang mga kinakailangan para sa bilis ng pagpapasok, ang ilang mga index ay ginawa upang mapabilis ang paglikha ng mga ulat.
Kasaysayan ng pag-block ng session
Ang isang talahanayan ay ginagamit upang mag-imbak ng mga kasalukuyang snapshot ng mga lock ng session:
activity_hist.history_locking :
--ACTIVITY_HIST.HISTORY_LOCKING
DROP TABLE IF EXISTS activity_hist.history_locking;
CREATE TABLE activity_hist.history_locking
(
timepoint timestamp without time zone ,
locktype text ,
relation oid ,
mode text ,
tid xid ,
vtid text ,
pid integer ,
blocking_pids integer[] ,
granted boolean
);
Gayundin, upang mapabilis ang pagpasok, walang mga index o paghihigpit.
Upang iimbak ang kasaysayan mismo, ginagamit ang isang partitioned table:
activity_hist.archive_locking:
DROP TABLE IF EXISTS activity_hist.archive_locking;
CREATE TABLE activity_hist.archive_locking
(
timepoint timestamp without time zone ,
locktype text ,
relation oid ,
mode text ,
tid xid ,
vtid text ,
pid integer ,
blocking_pids integer[] ,
granted boolean
)
PARTITION BY RANGE (timepoint);
Dahil sa kasong ito walang mga kinakailangan para sa bilis ng pagpapasok, ang ilang mga index ay ginawa upang mapabilis ang paglikha ng mga ulat.
2.Pagpupuno sa kasalukuyang kasaysayan
Upang direktang mangolekta ng mga snapshot ng view, isang bash script ang ginagamit na nagpapatakbo ng plpgsql function.
plpgsql Ang dblink function ay nag-a-access ng mga view sa target na database at naglalagay ng mga hilera sa mga talahanayan ng serbisyo sa database ng pagsubaybay.
get_current_activity.sql
CREATE OR REPLACE FUNCTION activity_hist.get_current_activity( current_host text , current_s_name text , current_s_pass text ) RETURNS BOOLEAN AS $$
DECLARE
database_rec record;
dblink_str text ;
BEGIN
EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||current_host||' port=5432 dbname=postgres'||
' user='||current_s_name||' password='||current_s_pass|| ' '')';
--------------------------------------------------------------------
--GET pg_stat_activity stats
INSERT INTO activity_hist.history_pg_stat_activity
(
SELECT * FROM dblink('LINK1',
'SELECT
now() ,
datid ,
datname ,
pid ,
usesysid ,
usename ,
application_name ,
client_addr ,
client_hostname ,
client_port ,
backend_start ,
xact_start ,
query_start ,
state_change ,
wait_event_type ,
wait_event ,
state ,
backend_xid ,
backend_xmin ,
query ,
backend_type
FROM pg_stat_activity
')
AS t (
timepoint timestamp without time zone ,
datid oid ,
datname name ,
pid integer,
usesysid oid ,
usename name ,
application_name text ,
client_addr inet ,
client_hostname text ,
client_port integer,
backend_start timestamp with time zone ,
xact_start timestamp with time zone ,
query_start timestamp with time zone ,
state_change timestamp with time zone ,
wait_event_type text ,
wait_event text ,
state text ,
backend_xid xid ,
backend_xmin xid ,
query text ,
backend_type text
)
);
---------------------------------------
--ACTIVITY_HIST.HISTORY_LOCKING
INSERT INTO activity_hist.history_locking
(
SELECT * FROM dblink('LINK1',
'SELECT
now() ,
lock.locktype,
lock.relation,
lock.mode,
lock.transactionid as tid,
lock.virtualtransaction as vtid,
lock.pid,
pg_blocking_pids(lock.pid),
lock.granted
FROM pg_catalog.pg_locks lock LEFT JOIN pg_catalog.pg_database db ON db.oid = lock.database
WHERE NOT lock.pid = pg_backend_pid()
')
AS t (
timepoint timestamp without time zone ,
locktype text ,
relation oid ,
mode text ,
tid xid ,
vtid text ,
pid integer ,
blocking_pids integer[] ,
granted boolean
)
);
PERFORM dblink_disconnect('LINK1');
RETURN TRUE ;
END
$$ LANGUAGE plpgsql;
Upang mangolekta ng mga snapshot ng view, ginagamit ang systemd service at dalawang script:
pg_current_activity.service
# /etc/systemd/system/pg_current_activity.service
[Unit]
Description=Collect history of pg_stat_activity , pg_locks
Wants=pg_current_activity.timer
[Service]
Type=forking
StartLimitIntervalSec=0
ExecStart=/home/postgres/pgutils/demon/get_current_activity.sh 10.124.70.40 postgres postgres
[Install]
WantedBy=multi-user.target
pg_current_activity.timer
# /etc/systemd/system/pg_current_activity.timer
[Unit]
Description=Run pg_current_activity.sh every 1 second
Requires=pg_current_activity.service
[Timer]
Unit=pg_current_activity.service
OnCalendar=*:*:0/1
AccuracySec=1
[Install]
WantedBy=timers.target
Magtalaga ng mga karapatan sa mga script:
# chmod 755 pg_current_activity.timer
# chmod 755 pg_current_activity.service
Simulan natin ang serbisyo:
# systemctl daemon-reload
# systemctl simulan ang pg_current_activity.service
Kaya, ang kasaysayan ng view ay kinokolekta sa anyo ng mga segundo-by-segundong snapshot. Siyempre, kung ang lahat ay naiwan, ang mga talahanayan ay tataas nang napakabilis sa laki at ang higit pa o hindi gaanong produktibong trabaho ay magiging imposible.
Kinakailangang ayusin ang pag-archive ng data.
3. Pag-archive ng kasaysayan
Para sa pag-archive, ginagamit ang mga naka-partition na table archive*.
Ang mga bagong partisyon ay ginagawa bawat oras, habang ang lumang data ay inaalis mula sa mga talahanayan ng kasaysayan*, kaya ang laki ng mga talahanayan ng kasaysayan* ay hindi gaanong nagbabago at ang bilis ng pagpasok ay hindi bumababa sa paglipas ng panahon.
Ang paglikha ng mga bagong seksyon ay ginagawa ng plpgsql function na activity_hist.archive_current_activity. Ang algorithm ng trabaho ay napaka-simple (gamit ang halimbawa ng seksyon para sa talahanayan ng archive_pg_stat_activity).
Lumikha at punan ang isang bagong seksyon
EXECUTE format(
'CREATE TABLE ' || partition_name ||
' PARTITION OF activity_hist.archive_pg_stat_activity FOR VALUES FROM ( %L ) TO ( %L ) ' ,
to_char(date_trunc('year', partition_min_range ),'YYYY')||'-'||
to_char(date_trunc('month', partition_min_range ),'MM')||'-'||
to_char(date_trunc('day', partition_min_range ),'DD')||' '||
to_char(date_trunc('hour', partition_min_range ),'HH24')||':00',
to_char(date_trunc('year', partition_max_range ),'YYYY')||'-'||
to_char(date_trunc('month', partition_max_range ),'MM')||'-'||
to_char(date_trunc('day', partition_max_range ),'DD')||' '||
to_char(date_trunc('hour', partition_max_range ),'HH24')||':00'
);
INSERT INTO activity_hist.archive_pg_stat_activity
(
SELECT *
FROM activity_hist.history_pg_stat_activity
WHERE timepoint BETWEEN partition_min_range AND partition_max_range
);
Paglikha ng mga index
EXECUTE format (
'CREATE INDEX '||index_name||
' ON '||partition_name||' ( wait_event_type , backend_type , timepoint )'
);
EXECUTE format ('CREATE INDEX '||index_name||
' ON '||partition_name||' ( wait_event_type , backend_type , timepoint , queryid )'
);
Pag-alis ng lumang data mula sa history_pg_stat_activity table
DELETE
FROM activity_hist.history_pg_stat_activity
WHERE timepoint < partition_max_range;
Siyempre, paminsan-minsan, ang mga lumang seksyon ay tinatanggal bilang hindi kailangan.
Mga pangunahing ulat
Sa totoo lang, bakit ginagawa ang lahat ng ito? Upang makakuha ng mga ulat na malabo na nakapagpapaalaala sa AWR ng Oracle.
Mahalagang idagdag na upang makatanggap ng mga ulat, kailangan mong bumuo ng koneksyon sa pagitan ng pg_stat_activity at pg_stat_statements view. Ang mga talahanayan ay naka-link sa pamamagitan ng pagdaragdag ng column na 'queryid' sa 'history_pg_stat_activity', 'archive_pg_stat_activity' na mga talahanayan. Ang paraan ng pagdaragdag ng halaga ng column ay lampas sa saklaw ng artikulong ito at inilalarawan dito β pg_stat_statements + pg_stat_activity + loq_query = pg_ash? .
KABUUANG ORAS ng CPU PARA SA MGA QUERY
Hiling :
WITH hist AS
(
SELECT
aa.query ,aa.queryid ,
count(*) * interval '1 second' AS duration
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND backend_type = 'client backend' AND datname != 'postgres' AND ( aa.wait_event_type IS NULL ) ANDaa.state = 'active'
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid
UNION
SELECT
ha.query ,ha.queryid,
count(*) * interval '1 second' AS duration
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND backend_type = 'client backend' AND datname != 'postgres' AND ( ha.wait_event_type IS NULL )AND ha.state = 'active'
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid
)
SELECT query , queryid , SUM( duration ) as duration
FROM hist
GROUP BY query , queryid
ORDER BY 3 DESC
KABUUANG PANAHON NG PAGHIHINTAY PARA SA MGA TANONG
Hiling :
WITH hist AS
(
SELECT
aa.query ,aa.queryid ,
count(*) * interval '1 second' AS duration
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( aa.wait_event_type IS NOT NULL )
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid
UNION
SELECT
ha.query ,ha.queryid,
count(*) * interval '1 second' AS duration
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( ha.wait_event_type IS NOT NULL )
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid
)
SELECT query , queryid , SUM( duration ) as duration
FROM hist
GROUP BY query , queryid
ORDER BY 3 DESC
WITH hist AS
(
SELECT
aa.wait_event_type , aa.wait_event
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
aa.wait_event IS NOT NULL
GROUP BY aa.wait_event_type , aa.wait_event
UNION
SELECT
ha.wait_event_type , ha.wait_event
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
ha.wait_event IS NOT NULL
GROUP BY ha.wait_event_type , ha.wait_event
)
SELECT wait_event_type , wait_event
FROM hist
GROUP BY wait_event_type , wait_event
ORDER BY 1 ASC,2 ASC
----------------------------------------------------------------------
WITH hist AS
(
SELECT
aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid ,
count(*) * interval '1 second' AS duration
FROM activity_hist.archive_pg_stat_activity aa
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( aa.wait_event_type = waitings_stat_rec.wait_event_type AND aa.wait_event = waitings_stat_rec.wait_event )
GROUP BY aa.wait_event_type , aa.wait_event , aa.query ,aa.queryid
UNION
SELECT
ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid,
count(*) * interval '1 second' AS duration
FROM activity_hist.history_pg_stat_activity_for_reports ha
WHERE timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
backend_type = 'client backend' AND datname != 'postgres' AND
( ha.wait_event_type = waitings_stat_rec.wait_event_type AND ha.wait_event = waitings_stat_rec.wait_event )
GROUP BY ha.wait_event_type , ha.wait_event , ha.query ,ha.queryid
)
SELECT query , queryid , SUM( duration ) as duration
FROM hist
GROUP BY query , queryid
ORDER BY 3 DESC
SELECT
MIN(date_trunc('second',timepoint)) AS started ,
count(*) * interval '1 second' as duration ,
pid , blocking_pids , relation , mode , locktype
FROM
activity_hist.archive_locking al
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY pid , blocking_pids , relation , mode , locktype
UNION
SELECT
MIN(date_trunc('second',timepoint)) AS started ,
count(*) * interval '1 second' as duration ,
pid , blocking_pids , relation , mode , locktype
FROM
activity_hist.history_locking
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY pid , blocking_pids , relation , mode , locktype
ORDER BY 1
SELECT
blocking_pids
FROM
activity_hist.archive_locking al
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY blocking_pids
UNION
SELECT
blocking_pids
FROM
activity_hist.history_locking
WHERE
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour') AND
NOT granted AND
locktype = 'relation'
GROUP BY blocking_pids
ORDER BY 1
---------------------------------------------------------------
SELECT
pid , usename , application_name , datname ,
MIN(date_trunc('second',timepoint)) as started ,
count(*) * interval '1 second' as duration ,
state ,
query
FROM activity_hist.archive_pg_stat_activity
WHERE pid= current_pid AND
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')
GROUP BY pid , usename , application_name ,
datname ,
state_change,
state ,
query
UNION
SELECT
pid , usename , application_name , datname ,
MIN(date_trunc('second',timepoint)) as started ,
count(*) * interval '1 second' as duration ,
state ,
query
FROM activity_hist.history_pg_stat_activity_for_reports
WHERE pid= current_pid AND
timepoint BETWEEN pg_stat_history_begin+(current_hour_diff * interval '1 hour') AND pg_stat_history_end+(current_hour_diff * interval '1 hour')
GROUP BY pid , usename , application_name ,
datname ,
state_change,
state ,
query
ORDER BY 5 , 1
Halimbawa:
------------------------------------------------- ------------------------------------------------- ------------------------------------------------- ------------------------ KASAYSAYAN NG PAG-BLOCKING NG MGA PROSESO +----+----------+------- ---+---------------------+-----------+-------------- -------------------------------+--------------------- - --------------------------------------------------- | #| pid| usename| application_name| datname| nagsimula| tagal| estado| query +----------+----------+----------+-----+ --------- -----------------------------------+------ --------------------------+---------------- ------- ----------------- | 1| 26211| tuser| psql| tdb1| 2019-09-02 19:31:54| 00:00:04| walang ginagawa| | 2| 26211| tuser| psql| tdb1| 2019-09-02 19:31:58| 00:00:06| walang ginagawa sa transaksyon| magsimula; | 3| 26211| tuser| psql| tdb1| 2019-09-02 19:32:16| 00:01:45| walang ginagawa sa transaksyon| lock table wafer_data; | 4| 26211| tuser| psql| tdb1| 2019-09-02 19:35:54| 00:01:23| walang ginagawa| mangako; | 5| 26211| tuser| psql| tdb1| 2019-09-02 19:38:46| 00:00:02| walang ginagawa sa transaksyon| magsimula; | 6| 26211| tuser| psql| tdb1| 2019-09-02 19:38:54| 00:00:08| walang ginagawa sa transaksyon| lock table wafer_data; | 7| 26211| tuser| psql| tdb1| 2019-09-02 19:39:08| 00:42:42| walang ginagawa| mangako; | 8| 26211| tuser| psql| tdb1| 2019-09-03 07:12:07| 00:00:52| aktibo| piliin ang test_del();
Pag-unlad.
Ang mga pangunahing query na ipinakita at ang mga resultang ulat ay ginagawang mas madali ang buhay kapag sinusuri ang mga insidente ng pagganap.
Batay sa mga pangunahing query, maaari kang makakuha ng ulat na malabo na kahawig ng AWR ng Oracle. Halimbawa ng ulat ng buod
+------------------------------------------------- ----------------------------------- | Pinagsama-samang Ulat PARA SA AKTIBIDAD AT PAGHIHINTAY.
Itutuloy. Ang susunod sa linya ay ang paglikha ng kasaysayan ng lock (pg_stat_locks), isang mas detalyadong paglalarawan ng proseso ng pagpuno ng mga talahanayan.