Do you remember how it all began. Everything was for the first time and again

About how I had to deal with PostgreSQL query optimization and what came out of all this.
Why did you have to? Yes, because the previous 4 years everything worked quietly, calmly, like a clock was ticking.
As an epigraph.

Do you remember how it all began. Everything was for the first time and again

Based on real events.
All names have been changed, coincidences are random.

When a certain result is achieved, it is always interesting to remember what was the impetus for the beginning, how it all began.

So, what happened as a result is briefly described in the article β€œSynthesis as one of the methods to improve PostgreSQL performanceΒ».

It will probably be interesting to recreate the chain of previous events.
The history kept the exact start date β€” 2018-09-10 18:02:48.
Also, in the story there is a request from which it all began:
Problem requestSELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from "SPENT_DATE") AS year,
extract(month from "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdataw,
spent s,
pmtrp,
spent_pdsp,
pd pd
WHERE s."SPENT_ID" = w."SPENT_ID"
AND p."PARAMETER_ID" = w."PARAMETER_ID"
AND s."SPENT_ID" = sp."SPENT_ID"
AND pd. "PD_ID" = sp. "PD_ID"
AND s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30'
and s."SPENT_DATE" = (SELECT MAX(s2."SPENT_DATE")
FROM spent s2,
wdata w2
WHERE s2."SPENT_ID" = w2."SPENT_ID"
AND w2."LRM" = w."LRM");


Description of the problem, predictably standard - β€œEverything is bad. Tell me what the problem is."
I immediately remembered a joke from the times of 3 and a half inch drives:

The lamer comes to the hacker.
- Nothing works for me, tell me where the problem is.
-In DNA...

But, of course, this is not the way to solve performance incidents. β€œWe may not be understood" (With). Need to figure it out.
Well, let's dig. Maybe that will accumulate as a result.

Do you remember how it all began. Everything was for the first time and again

investment started

So, what can be seen immediately with the naked eye, without even resorting to the help of EXPLAIN.
1) JOINs are not used. This is bad, especially if the number of connections is more than one.
2) But what is even worse - a correlated subquery, moreover, with aggregation. This is very bad.
This is bad, of course. But this is only on the one hand. On the other hand, this is very good, because the problem clearly has a solution and the request can be improved.
Don't go to the fortuneteller (C).
The query plan is not that complicated, but quite indicative:
Execution planDo you remember how it all began. Everything was for the first time and again

The most interesting and useful, as usual, at the beginning and end.
Nested Loop (cost=935.84..479763226.18 rows=3322 width=135) (actual time=31.536..8220420.295 rows=8111656 loops=1)
Planning time: 3.807ms
Execution time: 8222351.640ms
The execution time is more than 2 hours.

Do you remember how it all began. Everything was for the first time and again

False hypotheses that took time

Hypothesis 1- The optimizer is wrong, builds the wrong plan.

To visualize the execution plan, we will use the site https://explain.depesz.com/. However, the site did not show anything interesting or useful. At first and second glance - nothing that could really help. Unless - Full Scan is minimal. Go ahead.

Hypothesis 2-Impact on the base from the side of the autovacuum, you need to get rid of the brakes.

But, the autovacuum daemons behave well, there are no long-hanging processes. Any serious load - no. Need to look for something else.

Hypothesis 3-Statistics is outdated, you need to recalculate everything flies

Again, not that. The statistics are up to date. Which, given the lack of problems with autovacuum, is not surprising.

Starting to optimize

The main table 'wdata' is certainly not small, almost 3 million records.
And it is on this table that Full Scan goes.

Hash Cond: ((w."SPENT_ID" = s."SPENT_ID") AND ((SubPlan 1) = s."SPENT_DATE"))
-> Seq Scan on wdata w (cost=0.00..574151.49 rows=26886249 width=46) (actual time=0.005..8153.565 rows=26873950 loops=1)
We act as standard: β€œlet's make an index and everything flies”.
Made an index on the field "SPENT_ID"
Result:
Query execution plan using an indexDo you remember how it all began. Everything was for the first time and again

Well, did it help?
It was: 8 222 351.640 ms (just over 2 hours)
After: 6 985 431.575 ms (almost 2 hours)
In general, the same apples, side view.
Let's remember the classics:
β€œDo you have the same one, but without wings? Will seek".

Do you remember how it all began. Everything was for the first time and again

In principle, this could be called a good result, well, not good, but acceptable. At the very least, provide a large report to the customer describing how much has been done and why what is done is good.
However, the final decision is still far away. Very far.

And now the most interesting thing - we continue to optimize, we will polish the query

Step one - use JOIN

Rewritten query, now looks like this (well at least prettier):
Query using JOINSELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from "SPENT_DATE") AS year,
extract(month from "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdata w INNER JOIN spent s ON w."SPENT_ID"=s."SPENT_ID"
INNER JOIN pmtr p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN spent_pd sp ON s."SPENT_ID" = sp."SPENT_ID"
INNER JOIN pd pd ON pd."PD_ID" = sp."PD_ID"
WHERE
s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30'AND
s."SPENT_DATE" = (SELECT MAX(s2."SPENT_DATE")
FROM wdata w2 INNER JOIN spent s2 ON w2."SPENT_ID"=s2."SPENT_ID"
INNER JOIN wdata w
ON w2."LRM" = w."LRM" );
Planning time: 2.486ms
Execution time: 1223680.326ms

So here is the first result.
It was: 6 985 431.575 ms (almost 2 hours).
After: 1 223 680.326 ms (just over 20 minutes).
Good result. In principle, again, it would be possible to stop there. But so uninteresting, you can not stop.
FOR

Do you remember how it all began. Everything was for the first time and again

Step Two - Get rid of the correlated subquery

Changed request text:
No correlated subquerySELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from "SPENT_DATE") AS year,
extract(month from "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdata w INNER JOIN spent s ON s."SPENT_ID" = w."SPENT_ID"
INNER JOIN pmtr p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN spent_pd sp ON s."SPENT_ID" = sp."SPENT_ID"
INNER JOIN pd pd ON pd."PD_ID" = sp."PD_ID"
INNER JOIN (SELECT w2."LRM", MAX(s2."SPENT_DATE")
FROM spent s2 INNER JOIN wdata w2 ON s2."SPENT_ID" = w2."SPENT_ID"
GROUP BY w2.LRM
) md on w. "LRM" = md. "LRM"
WHERE
s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30';
Planning time: 2.291ms
Execution time: 165021.870ms

It was: 1 223 680.326 ms (just over 20 minutes).
After: 165 021.870 ms (just over 2 minutes).
This is already quite good.
However, as the English say,But, there is always a but". Too good a result should automatically arouse suspicion. Something is wrong here.

The hypothesis about correcting the query in order to get rid of the correlated subquery is correct. But it needs a little tweaking to get the final result right.
As a result, the first intermediate result:
Edited query without correlated subquerySELECT
p. "PARAMETER_ID" as parameter_id,
pd. "PD_NAME" AS pd_name,
pd. "CUSTOMER_PARTNUMBER" AS customer_partnumber,
w."LRM" AS LRM,
w. "LOTID" AS lotid,
w. "RTD_VALUE" AS RTD_value,
w. "LOWER_SPEC_LIMIT" AS lower_spec_limit,
w. "UPPER_SPEC_LIMIT" AS upper_spec_limit,
p."TYPE_CALCUL" AS type_calcul,
s."SPENT_NAME" AS spent_name,
s."SPENT_DATE" AS spent_date,
extract(year from s. "SPENT_DATE") AS year,
extract(month from s. "SPENT_DATE") as month,
s."REPORT_NAME" AS report_name,
p. "STPM_NAME" AS stpm_name,
p."CUSTOMERPARAM_NAME" AS customerparam_name
FROM wdata w INNER JOIN spent s ON s."SPENT_ID" = w."SPENT_ID"
INNER JOIN pmtr p ON p."PARAMETER_ID" = w."PARAMETER_ID"
INNER JOIN spent_pd sp ON s."SPENT_ID" = sp."SPENT_ID"
INNER JOIN pd pd ON pd."PD_ID" = sp."PD_ID"
INNER JOIN ( SELECT w2."LRM", MAX(s2."SPENT_DATE") AS "SPENT_DATE"
FROM spent s2 INNER JOIN wdata w2 ON s2."SPENT_ID" = w2."SPENT_ID"
GROUP BY w2.LRM
) md ON md."SPENT_DATE" = s."SPENT_DATE" AND md."LRM" = w."LRM"
WHERE
s."SPENT_DATE" >= '2018-07-01' AND s."SPENT_DATE" <= '2018-09-30';
Planning time: 3.192ms
Execution time: 208014.134ms

So, what we have as a result is the first acceptable result, which we are not ashamed to show to the customer:
Started with: 8 222 351.640 ms (more than 2 hours)
Achieved: 1 ms (just over 223 minutes).
Outcome (intermediate): 208 014.134 ms (just over 3 minutes).

Excellent result.

Do you remember how it all began. Everything was for the first time and again

Π‘onclusion

This could have stopped.
BUT…
Appetite comes with eating. The road will be mastered by walking. Any result is intermediate. Stopped dead. Etc.
Let's continue with the optimization.
Great idea. Especially considering that the customer was not even against it. And even strongly - for.

So, it's time to redesign the database. The query structure itself can no longer be optimized (although, as it turned out later, there is an option for everything to really fly). But now to optimize and develop the design of the database, this is already a very promising idea. And most importantly interesting. Again, remember youth. After all, I did not immediately become a DBA, I grew out of programmers (basic, assembler, si, si twice plused, oracle, plsql). An interesting topic, of course, for separate memoirs ;-).
However, let's not digress.

So,

Do you remember how it all began. Everything was for the first time and again

And maybe sectioning will help us?
Spoiler - "Yes, it helped, and in optimizing performance, including."

But this is a completely different story ...

To be continued…

Source: habr.com

Add a comment