About how I had to deal with PostgreSQL query optimization and what came out of all this.
Why did you have to? Yes, because the previous 4 years everything worked quietly, calmly, like a clock was ticking.
As an epigraph.
Based on real events.
All names have been changed, coincidences are random.
When a certain result is achieved, it is always interesting to remember what was the impetus for the beginning, how it all began.
So, what happened as a result is briefly described in the article β
It will probably be interesting to recreate the chain of previous events.
The history kept the exact start date β 2018-09-10 18:02:48.
Also, in the story there is a request from which it all began:
Problem requestSELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from "SPENT_DATE") AS year,
extract(month from "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdataw,
spent s,
pmtrp,
spent_pdsp,
pd pd
WHERE s."SPENT_ID" = w."SPENT_ID"
AND p."PARAMETER_ID" = w."PARAMETER_ID"
AND s."SPENT_ID" = sp."SPENT_ID"
AND pd. "PD_ID" = sp. "PD_ID"
AND s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30'
and s."SPENT_DATE" = (SELECT MAX(s2."SPENT_DATE")
FROM spent s2,
wdata w2
WHERE s2."SPENT_ID" = w2."SPENT_ID"
AND w2."LRM" = w."LRM");
Description of the problem, predictably standard - βEverything is bad. Tell me what the problem is."
I immediately remembered a joke from the times of 3 and a half inch drives:
The lamer comes to the hacker.
- Nothing works for me, tell me where the problem is.
-In DNA...
But, of course, this is not the way to solve performance incidents. βWe may not be understood" (With). Need to figure it out.
Well, let's dig. Maybe that will accumulate as a result.
investment started
So, what can be seen immediately with the naked eye, without even resorting to the help of EXPLAIN.
1) JOINs are not used. This is bad, especially if the number of connections is more than one.
2) But what is even worse - a correlated subquery, moreover, with aggregation. This is very bad.
This is bad, of course. But this is only on the one hand. On the other hand, this is very good, because the problem clearly has a solution and the request can be improved.
Don't go to the fortuneteller (C).
The query plan is not that complicated, but quite indicative:
Execution plan
The most interesting and useful, as usual, at the beginning and end.
Nested Loop (cost=935.84..479763226.18 rows=3322 width=135) (actual time=31.536..8220420.295 rows=8111656 loops=1)
Planning time: 3.807ms
Execution time: 8222351.640ms
The execution time is more than 2 hours.
False hypotheses that took time
Hypothesis 1- The optimizer is wrong, builds the wrong plan.
To visualize the execution plan, we will use the site
Hypothesis 2-Impact on the base from the side of the autovacuum, you need to get rid of the brakes.
But, the autovacuum daemons behave well, there are no long-hanging processes. Any serious load - no. Need to look for something else.
Hypothesis 3-Statistics is outdated, you need to recalculate everything flies
Again, not that. The statistics are up to date. Which, given the lack of problems with autovacuum, is not surprising.
Starting to optimize
The main table 'wdata' is certainly not small, almost 3 million records.
And it is on this table that Full Scan goes.
Hash Cond: ((w."SPENT_ID" = s."SPENT_ID") AND ((SubPlan 1) = s."SPENT_DATE"))
-> Seq Scan on wdata w (cost=0.00..574151.49 rows=26886249 width=46) (actual time=0.005..8153.565 rows=26873950 loops=1)
We act as standard: βlet's make an index and everything fliesβ.
Made an index on the field "SPENT_ID"
Result:
Query execution plan using an index
Well, did it help?
It was: 8 222 351.640 ms (just over 2 hours)
After: 6 985 431.575 ms (almost 2 hours)
In general, the same apples, side view.
Let's remember the classics:
βDo you have the same one, but without wings? Will seek".
In principle, this could be called a good result, well, not good, but acceptable. At the very least, provide a large report to the customer describing how much has been done and why what is done is good.
However, the final decision is still far away. Very far.
And now the most interesting thing - we continue to optimize, we will polish the query
Step one - use JOIN
Rewritten query, now looks like this (well at least prettier):
Query using JOINSELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from "SPENT_DATE") AS year,
extract(month from "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdata w INNER JOIN spent s ON w."SPENT_ID"=s."SPENT_ID"
INNER JOIN pmtr p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN spent_pd sp ON s."SPENT_ID" = sp."SPENT_ID"
INNER JOIN pd pd ON pd."PD_ID" = sp."PD_ID"
WHERE
s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30'AND
s."SPENT_DATE" = (SELECT MAX(s2."SPENT_DATE")
FROM wdata w2 INNER JOIN spent s2 ON w2."SPENT_ID"=s2."SPENT_ID"
INNER JOIN wdata w
ON w2."LRM" = w."LRM" );
Planning time: 2.486ms
Execution time: 1223680.326ms
So here is the first result.
It was: 6 985 431.575 ms (almost 2 hours).
After: 1 223 680.326 ms (just over 20 minutes).
Good result. In principle, again, it would be possible to stop there. But so uninteresting, you can not stop.
FOR
Step Two - Get rid of the correlated subquery
Changed request text:
No correlated subquerySELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from "SPENT_DATE") AS year,
extract(month from "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdata w INNER JOIN spent s ON s."SPENT_ID" = w."SPENT_ID"
INNER JOIN pmtr p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN spent_pd sp ON s."SPENT_ID" = sp."SPENT_ID"
INNER JOIN pd pd ON pd."PD_ID" = sp."PD_ID"
INNER JOIN (SELECT w2."LRM", MAX(s2."SPENT_DATE")
FROM spent s2 INNER JOIN wdata w2 ON s2."SPENT_ID" = w2."SPENT_ID"
GROUP BY w2.LRM
) md on w. "LRM" = md. "LRM"
WHERE
s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30';
Planning time: 2.291ms
Execution time: 165021.870ms
It was: 1 223 680.326 ms (just over 20 minutes).
After: 165 021.870 ms (just over 2 minutes).
This is already quite good.
However, as the English say,But, there is always a but". Too good a result should automatically arouse suspicion. Something is wrong here.
The hypothesis about correcting the query in order to get rid of the correlated subquery is correct. But it needs a little tweaking to get the final result right.
As a result, the first intermediate result:
Edited query without correlated subquerySELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from s. "SPENT_DATE") AS year,
extract(month from s. "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdata w INNER JOIN spent s ON s."SPENT_ID" = w."SPENT_ID"
INNER JOIN pmtr p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN spent_pd sp ON s."SPENT_ID" = sp."SPENT_ID"
INNER JOIN pd pd ON pd."PD_ID" = sp."PD_ID"
INNER JOIN ( SELECT w2."LRM", MAX(s2."SPENT_DATE") AS "SPENT_DATE"
FROM spent s2 INNER JOIN wdata w2 ON s2."SPENT_ID" = w2."SPENT_ID"
GROUP BY w2.LRM
) md ON md."SPENT_DATE" = s."SPENT_DATE" AND md."LRM" = w."LRM"
WHERE
s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30';
Planning time: 3.192ms
Execution time: 208014.134ms
So, what we have as a result is the first acceptable result, which we are not ashamed to show to the customer:
Started with: 8 222 351.640 ms (more than 2 hours)
Achieved: 1 ms (just over 223 minutes).
Outcome (intermediate): 208 014.134 ms (just over 3 minutes).
Excellent result.
Π‘onclusion
This could have stopped.
BUTβ¦
Appetite comes with eating. The road will be mastered by walking. Any result is intermediate. Stopped dead. Etc.
Let's continue with the optimization.
Great idea. Especially considering that the customer was not even against it. And even strongly - for.
So, it's time to redesign the database. The query structure itself can no longer be optimized (although, as it turned out later, there is an option for everything to really fly). But now to optimize and develop the design of the database, this is already a very promising idea. And most importantly interesting. Again, remember youth. After all, I did not immediately become a DBA, I grew out of programmers (basic, assembler, si, si twice plused, oracle, plsql). An interesting topic, of course, for separate memoirs ;-).
However, let's not digress.
So,
And maybe sectioning will help us?
Spoiler - "Yes, it helped, and in optimizing performance, including."
But this is a completely different story ...
To be continuedβ¦
Source: habr.com