Performance monitoring of PostgreSQL queries. Part 1 - reporting

Engineer - translated from Latin - inspired.
An engineer can do anything. (c) R. Diesel.
Epigraphs.
Performance monitoring of PostgreSQL queries. Part 1 - reporting
Or a story about why a database administrator needs to remember his programming past.

foreword

All names have been changed. Matches are random. The material is solely the personal opinion of the author.

Disclaimer of warranties: in the planned series of articles there will be no detailed and accurate description of the tables and scripts used. Materials cannot be immediately used "AS IS".
First, due to the large amount of material,
secondly, because of the sharpness with the production base of a real customer.
Therefore, only ideas and descriptions in the most general form will be given in the articles.
Maybe in the future the system will grow to the level of posting on GitHub, or maybe not. Time will show.

Beginning of the story-Do you remember how it all beganΒ».
What happened as a result, in the most general terms - "Synthesis as one of the methods to improve PostgreSQL performanceΒ»

Why do I need all this?

Well, firstly, so as not to forget yourself, remembering the glorious days in retirement.
Secondly, to systematize what was written. For already myself, sometimes I start to get confused and forget separate parts.

Well, and most importantly - suddenly it can come in handy for someone and help not to reinvent the wheel and not to collect a rake. In other words, improve your karma (not Khabrovsky). For the most valuable thing in this world is ideas. The main thing is to find an idea. And to translate the idea into reality is already a purely technical issue.

So let's start slowly...

Formulation of the problem.

There is:

PostgreSQL(10.5), mixed load (OLTP+DSS), medium to light load, hosted in the AWS cloud.
There is no database monitoring, infrastructure monitoring is presented as standard AWS tools in a minimal configuration.

Required:

Monitor the performance and status of the database, find and have initial information to optimize heavy database queries.

Brief introduction or analysis of solutions

To begin with, let's try to analyze the options for solving the problem from the point of view of a comparative analysis of the benefits and troubles for the engineer, and let those who are supposed to be on the staff list deal with the benefits and losses of management.

Option 1 - "Working on demand"

We leave everything as it is. If the customer is not satisfied with something in the health, performance of the database or application, he will notify the DBA engineers by e-mail or by creating an incident in the ticket box.
An engineer, having received a notification, will understand the problem, offer a solution, or shelve the problem, hoping that everything will resolve itself, and anyway, everything will soon be forgotten.
Gingerbread and donuts, bruises and bumpsGingerbread and donuts:
1. Nothing extra to do
2. There is always the opportunity to get out and get dirty.
3. A lot of time that you can spend on your own.
Bruises and bumps:
1. Sooner or later, the customer will think about the essence of being and universal justice in this world and once again ask himself the question - why am I paying them my money? The consequence is always the same - the only question is when the customer gets bored and waved goodbye. And the feeder is empty. It is sad.
2. The development of an engineer is zero.
3. Difficulties in scheduling work and loading

Option 2 - β€œDance with tambourines, put on and put on shoes”

Paragraph 1-Why do we need a monitoring system, we will receive all requests. We launch a bunch of all sorts of queries to the data dictionary and dynamic views, turn on all sorts of counters, bring everything into tables, periodically analyze lists and tables, as it were. As a result, we have beautiful or not very graphs, tables, reports. The main thing - that would be more, more.
Paragraph 2-Generate activity-run the analysis of all this.
Paragraph 3-We are preparing a certain document, we call this document, simply - "how do we equip the database."
Paragraph 4- The customer, seeing all this magnificence of graphs and figures, is in a childish naive confidence - now everything will work for us, soon. And, easily and painlessly part with their financial resources. Management is also sure that our engineers are working hard. Max loading.
Paragraph 5- Repeat step 1 regularly.
Gingerbread and donuts, bruises and bumpsGingerbread and donuts:
1. The life of managers and engineers is simple, predictable and filled with activity. Everything is buzzing, everyone is busy.
2. The life of the customer is also not bad - he is always sure that you need to be patient a little and everything will work out. Not getting better, well, well - this world is unfair, in the next life - lucky.
Bruises and bumps:
1. Sooner or later, there will be a smarter provider of a similar service that will do the same thing, but a little cheaper. And if the result is the same, why pay more. Which again will lead to the disappearance of the feeder.
2. It's boring. How boring any little meaningful activity.
3. As in the previous version - no development. But for an engineer, the minus is that, unlike the first option, here you need to constantly generate an IDB. And that takes time. Which can be spent for the benefit of your loved one. For you can’t take care of yourself, everyone cares about you.

Option 3-No need to invent a bicycle, you need to buy it and ride it.

Engineers from other companies knowingly eat pizza with beer (oh, the glorious times of St. Petersburg in the 90s). Let's use monitoring systems that are made, debugged and working, and generally speaking, they bring benefits (well, at least to their creators).
Gingerbread and donuts, bruises and bumpsGingerbread and donuts:
1. No need to waste time inventing what is already invented. Take and use.
2. Monitoring systems are not written by fools, and of course they are useful.
3. Working monitoring systems usually provide useful filtered information.
Bruises and bumps:
1. The engineer in this case is not an engineer, but just a user of someone else's product. Or a user.
2. The customer must be convinced of the need to buy something that he generally does not want to understand, and he should not, and in general the budget for the year has been approved and will not change. Then you need to allocate a separate resource, configure it for a specific system. Those. First you need to pay, pay and pay again. And the customer is stingy. This is the norm of this life.

What to do, Chernyshevsky? Your question is very pertinent. (With)

In this particular case and the current situation, you can do a little differently - let's make our own monitoring system.
Performance monitoring of PostgreSQL queries. Part 1 - reporting
Well, not a system, of course, in the full sense of the word, this is too loud and presumptuous, but at least somehow make it easier for yourself and collect more information to solve performance incidents. In order not to find yourself in a situation - β€œgo there, I don’t know where, find that, I don’t know what.”

What are the pros and cons of this option:

Pros:
1. It's interesting. Well, at least more interesting than the constant "shrink datafile, alter tablespace, etc."
2. These are new skills and new development. Which in the future will sooner or later give well-deserved gingerbread and donuts.
Cons:
1. Have to work. Work a lot.
2. You will have to regularly explain the meaning and perspectives of all activity.
3. Something will have to be sacrificed, because the only resource available to the engineer - time - is limited by the Universe.
4. The worst and most unpleasant - as a result, garbage like "Not a mouse, not a frog, but an unknown little animal" may turn out.

Who does not risk something does not drink champagne.
So, the fun begins.

General idea - schematic

Performance monitoring of PostgreSQL queries. Part 1 - reporting
(Illustration taken from article Β«Synthesis as one of the methods to improve PostgreSQL performance")

Explanation:

  • The target database is installed with the standard PostgreSQL extension β€œpg_stat_statements”.
  • In the monitoring database, we create a set of service tables to store the pg_stat_statements history at the initial stage and to configure metrics and monitoring in the future
  • On the monitoring host, we create a set of bash scripts, including those for generating incidents in the ticket system.

Service tables

To begin with, a schematically simplified ERD, what happened in the end:
Performance monitoring of PostgreSQL queries. Part 1 - reporting
Brief description of the tablesendpoint - host, connection point to the instance
- database options
pg_stat_history - historical table for storing temporary snapshots of the pg_stat_statements view of the target database
metric_glossary - Dictionary of performance metrics
metric_config - configuration of individual metrics
metric - a specific metric for the request that is being monitored
metric_alert_history - history of performance warnings
log_query - service table for storing parsed records from the PostgreSQL log file downloaded from AWS
baseline - parameters of the time period used as the base
checkpoint - configuration of metrics for checking the status of the database
checkpoint_alert_history - warning history of database status check metrics
pg_stat_db_queries β€” service table of active requests
activity_log β€” activity log service table
trap_oid - trap configuration service table

Stage 1 - collect performance statistics and get reports

A table is used to store statistical information. pg_stat_history
pg_stat_history table structure

                                          Table "public.pg_stat_history" Column | type | Modifiers--------------------+--------------------- --+------------------------------------ id | integer | not null default nextval('pg_stat_history_id_seq'::regclass) snapshot_timestamp | timestamp without timezone | database_id | integer | dbid | oid | userid | oid | queryid | bigint | query | text | calls | bigint | total_time | double precision | min_time | double precision | max_time | double precision | mean_time | double precision | stddev_time | double precision | rows | bigint | shared_blks_hit | bigint | shared_blks_read | bigint | shared_blks_dirtied | bigint | shared_blks_written | bigint | local_blks_hit | bigint | local_blks_read | bigint | local_blks_dirtied | bigint | local_blks_written | bigint | temp_blks_read | bigint | temp_blks_written | bigint | blk_read_time | double precision | blk_write_time | double precision | baseline_id | integer | Indexes: "pg_stat_history_pkey" PRIMARY KEY, btree (id) "database_idx" btree (database_id) "queryid_idx" btree (queryid) "snapshot_timestamp_idx" btree (snapshot_timestamp) Foreign-key constraints: "database_id_fk" FOREIGN KEY (database_id) REFERENCES database(id ) ON DELETE CASCADE

As you can see, the table is just a cumulative view data pg_stat_statements in the target database.

The use of this table is very simple.

pg_stat_history will represent the accumulated statistics of query execution for each hour. At the beginning of each hour, after filling in the table, statistics pg_stat_statements reset with pg_stat_statements_reset().
Note: Statistics are collected for requests with a duration of more than 1 second.
Populating the pg_stat_history table

--pg_stat_history.sql
CREATE OR REPLACE FUNCTION pg_stat_history( ) RETURNS boolean AS $$
DECLARE
  endpoint_rec record ;
  database_rec record ;
  pg_stat_snapshot record ;
  current_snapshot_timestamp timestamp without time zone;
BEGIN
  current_snapshot_timestamp = date_trunc('minute',now());  
  
  FOR endpoint_rec IN SELECT * FROM endpoint 
  LOOP
    FOR database_rec IN SELECT * FROM database WHERE endpoint_id = endpoint_rec.id 
	  LOOP
	    
		RAISE NOTICE 'NEW SHAPSHOT IS CREATING';
		
		--Connect to the target DB	  
	    EXECUTE 'SELECT dblink_connect(''LINK1'',''host='||endpoint_rec.host||' dbname='||database_rec.name||' user=USER password=PASSWORD '')';
 
        RAISE NOTICE 'host % and dbname % ',endpoint_rec.host,database_rec.name;
		RAISE NOTICE 'Creating snapshot of pg_stat_statements for database %',database_rec.name;
		
		SELECT 
	      *
		INTO 
		  pg_stat_snapshot
	    FROM dblink('LINK1',
	      'SELECT 
	       dbid , SUM(calls),SUM(total_time),SUM(rows) ,SUM(shared_blks_hit) ,SUM(shared_blks_read) ,SUM(shared_blks_dirtied) ,SUM(shared_blks_written) , 
           SUM(local_blks_hit) , SUM(local_blks_read) , SUM(local_blks_dirtied) , SUM(local_blks_written) , SUM(temp_blks_read) , SUM(temp_blks_written) , SUM(blk_read_time) , SUM(blk_write_time)
	       FROM pg_stat_statements WHERE dbid=(SELECT oid from pg_database where datname=current_database() ) 
		   GROUP BY dbid
  	      '
	               )
	      AS t
	       ( dbid oid , calls bigint , 
  	         total_time double precision , 
	         rows bigint , shared_blks_hit bigint , shared_blks_read bigint ,shared_blks_dirtied bigint ,shared_blks_written	 bigint ,
             local_blks_hit	 bigint ,local_blks_read bigint , local_blks_dirtied bigint ,local_blks_written bigint ,
             temp_blks_read	 bigint ,temp_blks_written bigint ,
             blk_read_time double precision , blk_write_time double precision	  
	       );
		 
		INSERT INTO pg_stat_history
          ( 
		    snapshot_timestamp  ,database_id  ,
			dbid , calls  ,total_time ,
            rows ,shared_blks_hit  ,shared_blks_read  ,shared_blks_dirtied  ,shared_blks_written ,local_blks_hit , 	 	
            local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written, 	
            blk_read_time, blk_write_time 
		  )		  
	    VALUES
	      (
	       current_snapshot_timestamp ,
		   database_rec.id ,
	       pg_stat_snapshot.dbid ,pg_stat_snapshot.calls,
	       pg_stat_snapshot.total_time,
	       pg_stat_snapshot.rows ,pg_stat_snapshot.shared_blks_hit ,pg_stat_snapshot.shared_blks_read ,pg_stat_snapshot.shared_blks_dirtied ,pg_stat_snapshot.shared_blks_written , 
           pg_stat_snapshot.local_blks_hit , pg_stat_snapshot.local_blks_read , pg_stat_snapshot.local_blks_dirtied , pg_stat_snapshot.local_blks_written , 
	       pg_stat_snapshot.temp_blks_read , pg_stat_snapshot.temp_blks_written , pg_stat_snapshot.blk_read_time , pg_stat_snapshot.blk_write_time 	   
	      );		   
		  
        RAISE NOTICE 'Creating snapshot of pg_stat_statements for queries with min_time more than 1000ms';
	
        FOR pg_stat_snapshot IN
          --All queries with max_time greater than 1000 ms
	      SELECT 
	        *
	      FROM dblink('LINK1',
	        'SELECT 
	         dbid , userid ,queryid,query,calls,total_time,min_time ,max_time,mean_time, stddev_time ,rows ,shared_blks_hit ,
			 shared_blks_read ,shared_blks_dirtied ,shared_blks_written , 
             local_blks_hit , local_blks_read , local_blks_dirtied , 
			 local_blks_written , temp_blks_read , temp_blks_written , blk_read_time , 
			 blk_write_time
	         FROM pg_stat_statements 
			 WHERE dbid=(SELECT oid from pg_database where datname=current_database() AND min_time >= 1000 ) 
  	        '

	                  )
	        AS t
	         ( dbid oid , userid oid , queryid bigint ,query text , calls bigint , 
  	           total_time double precision ,min_time double precision	 ,max_time double precision	 , mean_time double precision	 ,  stddev_time double precision	 , 
	           rows bigint , shared_blks_hit bigint , shared_blks_read bigint ,shared_blks_dirtied bigint ,shared_blks_written	 bigint ,
               local_blks_hit	 bigint ,local_blks_read bigint , local_blks_dirtied bigint ,local_blks_written bigint ,
               temp_blks_read	 bigint ,temp_blks_written bigint ,
               blk_read_time double precision , blk_write_time double precision	  
	         )
	    LOOP
		  INSERT INTO pg_stat_history
          ( 
		    snapshot_timestamp  ,database_id  ,
			dbid ,userid  , queryid  , query  , calls  ,total_time ,min_time ,max_time ,mean_time ,stddev_time ,
            rows ,shared_blks_hit  ,shared_blks_read  ,shared_blks_dirtied  ,shared_blks_written ,local_blks_hit , 	 	
            local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written, 	
            blk_read_time, blk_write_time 
		  )		  
	      VALUES
	      (
	       current_snapshot_timestamp ,
		   database_rec.id ,
	       pg_stat_snapshot.dbid ,pg_stat_snapshot.userid ,pg_stat_snapshot.queryid,pg_stat_snapshot.query,pg_stat_snapshot.calls,
	       pg_stat_snapshot.total_time,pg_stat_snapshot.min_time ,pg_stat_snapshot.max_time,pg_stat_snapshot.mean_time, pg_stat_snapshot.stddev_time ,
	       pg_stat_snapshot.rows ,pg_stat_snapshot.shared_blks_hit ,pg_stat_snapshot.shared_blks_read ,pg_stat_snapshot.shared_blks_dirtied ,pg_stat_snapshot.shared_blks_written , 
           pg_stat_snapshot.local_blks_hit , pg_stat_snapshot.local_blks_read , pg_stat_snapshot.local_blks_dirtied , pg_stat_snapshot.local_blks_written , 
	       pg_stat_snapshot.temp_blks_read , pg_stat_snapshot.temp_blks_written , pg_stat_snapshot.blk_read_time , pg_stat_snapshot.blk_write_time 	   
	      );
		  
        END LOOP;

        PERFORM dblink_disconnect('LINK1');  
				
	  END LOOP ;--FOR database_rec IN SELECT * FROM database WHERE endpoint_id = endpoint_rec.id 
    
  END LOOP;

RETURN TRUE;  
END
$$ LANGUAGE plpgsql;

As a result, after a certain period of time in the table pg_stat_history we will have a set of snapshots of the contents of the table pg_stat_statements target database.

Actually reporting

Using simple queries, you can get quite useful and interesting reports.

Aggregated data for a given period of time

Request

SELECT 
  database_id , 
  SUM(calls) AS calls ,SUM(total_time)  AS total_time ,
  SUM(rows) AS rows , SUM(shared_blks_hit)  AS shared_blks_hit,
  SUM(shared_blks_read) AS shared_blks_read ,
  SUM(shared_blks_dirtied) AS shared_blks_dirtied,
  SUM(shared_blks_written) AS shared_blks_written , 
  SUM(local_blks_hit) AS local_blks_hit , 
  SUM(local_blks_read) AS local_blks_read , 
  SUM(local_blks_dirtied) AS local_blks_dirtied , 
  SUM(local_blks_written)  AS local_blks_written,
  SUM(temp_blks_read) AS temp_blks_read, 
  SUM(temp_blks_written) temp_blks_written , 
  SUM(blk_read_time) AS blk_read_time , 
  SUM(blk_write_time) AS blk_write_time
FROM 
  pg_stat_history
WHERE 
  queryid IS NULL AND
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT
GROUP BY database_id ;

D.B. Time

to_char(interval '1 millisecond' * pg_total_stat_history_rec.total_time, 'HH24:MI:SS.MS')

I/O Time

to_char(interval '1 millisecond' * ( pg_total_stat_history_rec.blk_read_time + pg_total_stat_history_rec.blk_write_time ), 'HH24:MI:SS.MS')

TOP10 SQL by total_time

Request

SELECT 
  queryid , 
  SUM(calls) AS calls ,
  SUM(total_time)  AS total_time  	
FROM 
  pg_stat_history
WHERE 
  queryid IS NOT NULL AND 
  database_id = DATABASE_ID AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT 
GROUP BY queryid 
ORDER BY 3 DESC 
LIMIT 10
-------------------------------------------------- ------------------------------------ | TOP10 SQL BY TOTAL EXECUTION TIME | #| queryid| calls| calls %| total_time (ms) | dbtime % +----+-----------+-----------+-----------+------ --------------------+---------- | 1| 821760255| 2| .00001|00:03:23.141( 203141.681 ms.)| 5.42 | 2| 4152624390| 2| .00001|00:03:13.929( 193929.215 ms.)| 5.17 | 3| 1484454471| 4| .00001|00:02:09.129( 129129.057 ms.)| 3.44 | 4| 655729273| 1| .00000|00:02:01.869( 121869.981 ms.)| 3.25 | 5| 2460318461| 1| .00000|00:01:33.113( 93113.835 ms.)| 2.48 | 6| 2194493487| 4| .00001|00:00:17.377( 17377.868 ms.)| .46 | 7| 1053044345| 1| .00000|00:00:06.156( 6156.352 ms.)| .16 | 8| 3644780286| 1| .00000|00:00:01.063( 1063.830 ms.)| .03

TOP10 SQL by total I/O time

Request

SELECT 
  queryid , 
  SUM(calls) AS calls ,
  SUM(blk_read_time + blk_write_time)  AS io_time
FROM 
  pg_stat_history
WHERE 
  queryid IS NOT NULL AND 
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT
GROUP BY  queryid 
ORDER BY 3 DESC 
LIMIT 10
-------------------------------------------------- -------------------------------------- | TOP10 SQL BY TOTAL I/O TIME | #| queryid| calls| calls %| I/O time (ms)|db I/O time % +----+-----------+-----------+------ -----+--------------------------------+----------- -- | 1| 4152624390| 2| .00001|00:08:31.616( 511616.592 ms.)| June 31.06 | 2| 821760255| 2| .00001|00:08:27.099( 507099.036 ms.)| 30.78 | 3| 655729273| 1| .00000|00:05:02.209( 302209.137 ms.)| 18.35 | 4| 2460318461| 1| .00000|00:04:05.981( 245981.117 ms.)| 14.93 | 5| 1484454471| 4| .00001|00:00:39.144( 39144.221 ms.)| 2.38 | 6| 2194493487| 4| .00001|00:00:18.182( 18182.816 ms.)| 1.10 | 7| 1053044345| 1| .00000|00:00:16.611( 16611.722 ms.)| 1.01 | 8| 3644780286| 1| .00000|00:00:00.436( 436.205 ms.)| .03

TOP10 SQL by max time of execution

Request

SELECT 
  id AS snapshotid , 
  queryid , 
  snapshot_timestamp ,  
  max_time 
FROM 
  pg_stat_history 
WHERE 
  queryid IS NOT NULL AND 
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT
ORDER BY 4 DESC 
LIMIT 10

-------------------------------------------------- ------------------------------------ | TOP10 SQL BY MAX EXECUTION TIME | #| snapshot| snapshotID| queryid| max_time (ms) +----+------------------+-----------+--------- --+---------------------------------------- | 1| 05.04.2019/01/03 4169:655729273| 00| 02| 01.869:121869.981:2( 04.04.2019 ms.) | 17| 00/4153/821760255 00:01| 41.570| 101570.841| 3:04.04.2019:16( 00 ms.) | 4146| 821760255/00/01 41.570:101570.841| 4| 04.04.2019| 16:00:4144( 4152624390 ms.) | 00| 01/36.964/96964.607 5:04.04.2019| 17| 00| 4151:4152624390:00( 01 ms.) | 36.964| 96964.607/6/05.04.2019 10:00| 4188| 1484454471| 00:01:33.452( 93452.150 ms.) | 7| 04.04.2019/17/00 4150:2460318461 | 00| 01| 33.113:93113.835:8( 04.04.2019 ms.) | 15| 00/4140/1484454471 00:00| 11.892| 11892.302| 9:04.04.2019:16( 00 ms.) | 4145| 1484454471/00/00 11.892:11892.302| 10| 04.04.2019| 17:00:4152( 1484454471 ms.) | 00| 00/11.892/11892.302 XNUMX:XNUMX| XNUMX| XNUMX| XNUMX:XNUMX:XNUMX( XNUMX ms.) | XNUMX| XNUMX/XNUMX/XNUMX XNUMX:XNUMX| XNUMX| XNUMX| XNUMX:XNUMX:XNUMX( XNUMX ms.)

TOP10 SQL by SHARED buffer read/write

Request

SELECT 
  id AS snapshotid , 
  queryid ,
  snapshot_timestamp , 
  shared_blks_read , 
  shared_blks_written 
FROM 
  pg_stat_history
WHERE 
  queryid IS NOT NULL AND 
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT AND
  ( shared_blks_read > 0 OR shared_blks_written > 0 )
ORDER BY 4 DESC  , 5 DESC 
LIMIT 10
-------------------------------------------------- ------------------------------------ | TOP10 SQL BY SHARED BUFFER READ/WRITE | #| snapshot| snapshotID| queryid| shared blocks read| shared blocks write +----+------------------+-----------+---------- -+---------------------+--------------------- | 1| 04.04.2019/17/00 4153:821760255| 797308| 0| 2| 04.04.2019 | 16| 00/4146/821760255 797308:0| 3| 05.04.2019| 01| 03 | 4169| 655729273/797158/0 4:04.04.2019| 16| 00| 4144| 4152624390 | 756514| 0/5/04.04.2019 17:00| 4151| 4152624390| 756514| 0 | 6| 04.04.2019/17/00 4150:2460318461| 734117| 0| 7| 04.04.2019 | 17| 00/4155/3644780286 52973:0| 8| 05.04.2019| 01| 03 | 4168| 1053044345/52818/0 9:04.04.2019| 15| 00| 4141| 2194493487 | 52813| 0/10/04.04.2019 16:00| 4147| 2194493487| 52813| 0 | XNUMX| XNUMX/XNUMX/XNUMX XNUMX:XNUMX| XNUMX| XNUMX| XNUMX| XNUMX | XNUMX| XNUMX/XNUMX/XNUMX XNUMX:XNUMX| XNUMX| XNUMX| XNUMX| XNUMX ------------------------------------------------- -------------------------------------------------

Histogram of query distribution by maximum execution time

Inquiries

SELECT  
  MIN(max_time) AS hist_min  , 
  MAX(max_time) AS hist_max , 
  (( MAX(max_time) - MIN(min_time) ) / hist_columns ) as hist_width
FROM 
  pg_stat_history 
WHERE 
  queryid IS NOT NULL AND
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT ;

SELECT 
  SUM(calls) AS calls
FROM 
  pg_stat_history 
WHERE 
  queryid IS NOT NULL AND
  database_id =DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT AND 
  ( max_time >= hist_current_min AND  max_time < hist_current_max ) ;
|------------------------------------------------- ---------------------------------------- | MAX_TIME HISTOGRAM | TOTAL CALLS : 33851920 | MIN TIME : 00:00:01.063 | MAX TIME : 00:02:01.869 ---------------------------------- ------------------------------------ | min duration| max duration| calls +----------------------------------+------------- ---------------------+---------- | 00:00:01.063( 1063.830 ms.) | 00:00:13.144( 13144.445 ms.) | 9 | 00:00:13.144( 13144.445 ms.) | 00:00:25.225( 25225.060 ms.) | 0 | 00:00:25.225( 25225.060 ms.) | 00:00:37.305( 37305.675 ms.) | 0 | 00:00:37.305( 37305.675 ms.) | 00:00:49.386( 49386.290 ms.) | 0 | 00:00:49.386( 49386.290 ms.) | 00:01:01.466( 61466.906 ms.) | 0 | 00:01:01.466( 61466.906 ms.) | 00:01:13.547( 73547.521 ms.) | 0 | 00:01:13.547( 73547.521 ms.) | 00:01:25.628( 85628.136 ms.) | 0 | 00:01:25.628( 85628.136 ms.) | 00:01:37.708( 97708.751 ms.) | 4 | 00:01:37.708( 97708.751 ms.) | 00:01:49.789( 109789.366 ms.) | 2 | 00:01:49.789( 109789.366 ms.) | 00:02:01.869( 121869.981 ms.) | 0

TOP10 Snapshots by Query per Second

Inquiries

--pg_qps.sql
--Calculate Query Per Second 
CREATE OR REPLACE FUNCTION pg_qps( pg_stat_history_id integer ) RETURNS double precision AS $$
DECLARE
 pg_stat_history_rec record ;
 prev_pg_stat_history_id integer ;
 prev_pg_stat_history_rec record;
 total_seconds double precision ;
 result double precision;
BEGIN 
  result = 0 ;
  
  SELECT *
  INTO pg_stat_history_rec
  FROM 
    pg_stat_history
  WHERE id = pg_stat_history_id ;

  IF pg_stat_history_rec.snapshot_timestamp IS NULL 
  THEN
    RAISE EXCEPTION 'ERROR - Not found pg_stat_history for id = %',pg_stat_history_id;
  END IF ;  
  
 --RAISE NOTICE 'pg_stat_history_id = % , snapshot_timestamp = %', pg_stat_history_id , 
 pg_stat_history_rec.snapshot_timestamp ;
  
  SELECT 
    MAX(id)   
  INTO
    prev_pg_stat_history_id
  FROM
    pg_stat_history
  WHERE 
    database_id = pg_stat_history_rec.database_id AND
	queryid IS NULL AND
	id < pg_stat_history_rec.id ;

  IF prev_pg_stat_history_id IS NULL 
  THEN
    RAISE NOTICE 'Not found previous pg_stat_history shapshot for id = %',pg_stat_history_id;
	RETURN NULL ;
  END IF;
  
  SELECT *
  INTO prev_pg_stat_history_rec
  FROM 
    pg_stat_history
  WHERE id = prev_pg_stat_history_id ;
  
  --RAISE NOTICE 'prev_pg_stat_history_id = % , prev_snapshot_timestamp = %', prev_pg_stat_history_id , prev_pg_stat_history_rec.snapshot_timestamp ;    

  total_seconds = extract(epoch from ( pg_stat_history_rec.snapshot_timestamp - prev_pg_stat_history_rec.snapshot_timestamp ));
  
  --RAISE NOTICE 'total_seconds = % ', total_seconds ;    
  
  --RAISE NOTICE 'calls = % ', pg_stat_history_rec.calls ;      
  
  IF total_seconds > 0 
  THEN
    result = pg_stat_history_rec.calls / total_seconds ;
  ELSE
   result = 0 ; 
  END IF;
   
 RETURN result ;
END
$$ LANGUAGE plpgsql;


SELECT 
  id , 
  snapshot_timestamp ,
  calls , 	
  total_time , 
  ( select pg_qps( id )) AS QPS ,
  blk_read_time ,
  blk_write_time
FROM 
  pg_stat_history
WHERE 
  queryid IS NULL AND 
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT AND
  ( select pg_qps( id )) IS NOT NULL 
ORDER BY 5 DESC 
LIMIT 10
|------------------------------------------------- ---------------------------------------- | TOP10 Snapshots ordered by QueryPerSeconds numbers -------------------------------------- -------------------------------------------------- ------------------------------------------------- | #| snapshot| snapshotID| calls| total dbtime| QPS | I/O time | I/O time % +-----+------------------+-----------+------- ----+----------------------------------+---------- -+----------------------------------+----------- | 1| 04.04.2019/20/04 4161:5758631| 00| 06| 30.513:390513.926:1573.396( 00 ms.)| 00| 01.470:1470.110:376( 2 ms.)| .04.04.2019 | 17| 00/4149/3529197 00:11| 48.830| 708830.618| 980.332:00:12( 47.834 ms.)| 767834.052| 108.324:3:04.04.2019( 16 ms.)| 00 | 4143| 3525360/00/10 13.492:613492.351| 979.267| 00| 08:41.396:521396.555( 84.988 ms.)| 4| 04.04.2019:21:03( 4163 ms.)| 2781536 | 00| 03/06.470/186470.979 785.745:00| 00| 00.249| 249.865:134:5( 04.04.2019 ms.)| 19| 03:4159:2890362( 00 ms.)| .03 | 16.784| 196784.755/776.979/00 00:01.441| 1441.386| 732| 6:04.04.2019:14( 00 ms.)| 4137| 2397326:00:04( 43.033 ms.)| .283033.854 | 665.924| 00/00/00.024 24.505:009 | 7| 04.04.2019| 15:00:4139( 2394416 ms.)| 00| 04:51.435:291435.010( 665.116 ms.)| .00 | 00| 12.025/12025.895/4.126 8:04.04.2019| 13| 00| 4135:2373043:00( 04 ms.)| 26.791| 266791.988:659.179:00( 00 ms.)| 00.064 | 64.261| 024/9/05.04.2019 01:03 | 4167| 4387191| 00:06:51.380( 411380.293 ms.)| 609.332| 00:05:18.847( 318847.407 ms.)| .77.507 | 10| 04.04.2019/18/01 4157:1145596| 00| 01| 19.217:79217.372:313.004( 00 ms.)| 00| 01.319:1319.676:1.666( XNUMX ms.)| XNUMX | XNUMX| XNUMX/XNUMX/XNUMX XNUMX:XNUMX| XNUMX| XNUMX| XNUMX:XNUMX:XNUMX( XNUMX ms.)| XNUMX| XNUMX:XNUMX:XNUMX( XNUMX ms.)| XNUMX

Hourly Execution History with QueryPerSeconds and I/O Time

Request

SELECT 
  id , 
  snapshot_timestamp ,
  calls , 	
  total_time , 
  ( select pg_qps( id )) AS QPS ,
  blk_read_time ,
  blk_write_time
FROM 
  pg_stat_history
WHERE 
  queryid IS NULL AND 
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT
ORDER BY 2
|-----------------------------------------------------------------------------------------------
| HOURLY EXECUTION HISTORY  WITH QueryPerSeconds and I/O Time
-----------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PER SECOND HISTORY
|    #|          snapshot| snapshotID|      calls|                      total dbtime|        QPS|                          I/O time| I/O time %
+-----+------------------+-----------+-----------+----------------------------------+-----------+----------------------------------+-----------
|    1|  04.04.2019 11:00|       4131|       3747|  00:00:00.835(       835.374 ms.)|      1.041|  00:00:00.000(          .000 ms.)|       .000
|    2|  04.04.2019 12:00|       4133|    1002722|  00:01:52.419(    112419.376 ms.)|    278.534|  00:00:00.149(       149.105 ms.)|       .133
|    3|  04.04.2019 13:00|       4135|    2373043|  00:04:26.791(    266791.988 ms.)|    659.179|  00:00:00.064(        64.261 ms.)|       .024
|    4|  04.04.2019 14:00|       4137|    2397326|  00:04:43.033(    283033.854 ms.)|    665.924|  00:00:00.024(        24.505 ms.)|       .009
|    5|  04.04.2019 15:00|       4139|    2394416|  00:04:51.435(    291435.010 ms.)|    665.116|  00:00:12.025(     12025.895 ms.)|      4.126
|    6|  04.04.2019 16:00|       4143|    3525360|  00:10:13.492(    613492.351 ms.)|    979.267|  00:08:41.396(    521396.555 ms.)|     84.988
|    7|  04.04.2019 17:00|       4149|    3529197|  00:11:48.830(    708830.618 ms.)|    980.332|  00:12:47.834(    767834.052 ms.)|    108.324
|    8|  04.04.2019 18:01|       4157|    1145596|  00:01:19.217(     79217.372 ms.)|    313.004|  00:00:01.319(      1319.676 ms.)|      1.666
|    9|  04.04.2019 19:03|       4159|    2890362|  00:03:16.784(    196784.755 ms.)|    776.979|  00:00:01.441(      1441.386 ms.)|       .732
|   10|  04.04.2019 20:04|       4161|    5758631|  00:06:30.513(    390513.926 ms.)|   1573.396|  00:00:01.470(      1470.110 ms.)|       .376
|   11|  04.04.2019 21:03|       4163|    2781536|  00:03:06.470(    186470.979 ms.)|    785.745|  00:00:00.249(       249.865 ms.)|       .134
|   12|  04.04.2019 23:03|       4165|    1443155|  00:01:34.467(     94467.539 ms.)|    200.438|  00:00:00.015(        15.287 ms.)|       .016
|   13|  05.04.2019 01:03|       4167|    4387191|  00:06:51.380(    411380.293 ms.)|    609.332|  00:05:18.847(    318847.407 ms.)|     77.507
|   14|  05.04.2019 02:03|       4171|     189852|  00:00:10.989(     10989.899 ms.)|     52.737|  00:00:00.539(       539.110 ms.)|      4.906
|   15|  05.04.2019 03:01|       4173|       3627|  00:00:00.103(       103.000 ms.)|      1.042|  00:00:00.004(         4.131 ms.)|      4.010
|   16|  05.04.2019 04:00|       4175|       3627|  00:00:00.085(        85.235 ms.)|      1.025|  00:00:00.003(         3.811 ms.)|      4.471
|   17|  05.04.2019 05:00|       4177|       3747|  00:00:00.849(       849.454 ms.)|      1.041|  00:00:00.006(         6.124 ms.)|       .721
|   18|  05.04.2019 06:00|       4179|       3747|  00:00:00.849(       849.561 ms.)|      1.041|  00:00:00.000(          .051 ms.)|       .006
|   19|  05.04.2019 07:00|       4181|       3747|  00:00:00.839(       839.416 ms.)|      1.041|  00:00:00.000(          .062 ms.)|       .007
|   20|  05.04.2019 08:00|       4183|       3747|  00:00:00.846(       846.382 ms.)|      1.041|  00:00:00.000(          .007 ms.)|       .001
|   21|  05.04.2019 09:00|       4185|       3747|  00:00:00.855(       855.426 ms.)|      1.041|  00:00:00.000(          .065 ms.)|       .008
|   22|  05.04.2019 10:00|       4187|       3797|  00:01:40.150(    100150.165 ms.)|      1.055|  00:00:21.845(     21845.217 ms.)|     21.812

Text of all SQL selects

Request

SELECT 
  queryid , 
  query 
FROM 
  pg_stat_history
WHERE 
  queryid IS NOT NULL AND 
  database_id = DATABASE_ID  AND
  snapshot_timestamp BETWEEN BEGIN_TIMEPOINT AND END_TIMEPOINT
GROUP BY queryid , query

Π‘onclusion

As you can see, by fairly simple means, you can get a lot of useful information about the workload and the state of the database.

Note:If you fix the queryid in the queries, then we will get the history for a separate request (in order to save space, reports for a separate request are omitted).

So, statistical data on query performance is available and collected.
The first stage "collection of statistical data" is completed.

You can proceed to the second stage - "configuring performance metrics".
Performance monitoring of PostgreSQL queries. Part 1 - reporting

But this is a completely different story.

To be continued ...

Source: habr.com

Add a comment