PostgreSQL Query Profiler: How to match plan and query
Many who already use explain.tensor.ru - our PostgreSQL plan visualization service may not be aware of one of its superpowers - to turn a hard-to-read piece of the server log ...
… into a beautifully designed query with contextual hints for the relevant plan nodes:
In this transcript of the second part of his report at PGConf.Russia 2020 I will tell you how we managed to do it.
The transcript of the first part, devoted to typical query performance problems and their solutions, can be found in the article "Recipes for Sick SQL Queries".
First, let's do the coloring - and we will no longer color the plan, we have already painted it, it is already beautiful and understandable, but the request.
It seemed to us that the request pulled out of the log with an unformatted “sheet” looks very ugly and therefore inconvenient.
Especially when developers “glue” the request body in the code (this, of course, is an anti-pattern, but it happens) in one line. Horror!
Let's draw it somehow more beautifully.
And if we can draw it beautifully, that is, disassemble and assemble the request body back, then we can then “attach” a hint to each object of this request - what happened at the corresponding point in the plan.
Query Syntax Tree
To do this, the query must first be parsed.
Because, we have the core of the system runs on NodeJS, then we made a module for it, you can find it on github. In fact, these are extended "bindings" to the internals of the PostgreSQL parser itself. That is, the grammar is simply binary compiled and bindings are made to it by NodeJS. We took other people's modules as a basis - there is no big secret here.
We feed the body of the input request to our function - at the output we get a parsed syntax tree in the form of a JSON object.
Now we can run through this tree in the opposite direction and assemble the request with the indents, coloring, formatting that we want. No, this is not configurable, but we thought that this would be convenient.
Mapping Query and Plan Nodes
Now let's see how we can combine the plan that we analyzed in the first step and the query that we analyzed in the second.
Let's take a simple example - we have a request that forms a CTE and reads it twice. He generates such a plan.
And, that means, if we see somewhere in the request the generation of CTE and somewhere in the plan the node CTE, then these nodes uniquely “fight” among themselves, we can immediately combine them.
Task "with an asterisk"Note: CTEs can be nested.
There are very poorly nested, and even the same name. For example, you can inside CTE A do CTE X, and at the same level inside CTE B do again CTE X:
WITH A AS (
WITH X AS (...)
SELECT ...
)
, B AS (
WITH X AS (...)
SELECT ...
)
...
When comparing, you must understand this. It is very difficult to understand this with "eyes" - even seeing the plan, even seeing the body of the request. If your CTE generation is complex, nested, the requests are large, then it is completely unconscious.
UNITY
If we have a keyword in the request UNION [ALL] (the operator of joining two samples), then it corresponds in the plan to either the node Append, or some Recursive Union.
That which is "above" UNION - this is the first child of our node, which is "from below" - the second. If through UNION we have "glued" several blocks at once, then Append-node will still have only one, but it will have not two children, but many - in order as they go, respectively:
(...) -- #1
UNION ALL
(...) -- #2
UNION ALL
(...) -- #3
Append
-> ... #1
-> ... #2
-> ... #3
Task "with an asterisk": inside recursive fetch generation (WITH RECURSIVE) can also be more than one UNION. But only the very last block after the last one is always recursive UNION. Everything above is one but different UNION:
WITH RECURSIVE T AS(
(...) -- #1
UNION ALL
(...) -- #2, тут кончается генерация стартового состояния рекурсии
UNION ALL
(...) -- #3, только этот блок рекурсивный и может содержать обращение к T
)
...
Such examples also need to be able to "paste". In this example, we see that UNION-segments in our request were 3 pieces. Accordingly, one UNION соответствует Append-node, and the other - Recursive Union.
Read-write data
Everything, laid out, now we know which piece of the request corresponds to which piece of the plan. And in these pieces we can easily and naturally find those objects that are “readable”.
From the point of view of the query, we do not know if this is a table or a CTE, but they are denoted by the same node RangeVar. And in the “readable” plan, this is also a fairly limited set of nodes:
Seq Scan on [tbl]
Bitmap Heap Scan on [tbl]
Index [Only] Scan [Backward] using [idx] on [tbl]
CTE Scan on [cte]
Insert/Update/Delete on [tbl]
We know the structure of the plan and the request, we know the correspondence of the blocks, we know the names of the objects - we make an unambiguous comparison.
Again task "with an asterisk". We take a request, execute it, we don’t have any aliases - we just read it twice from one CTE.
Let's look at the plan - what's the trouble? Why did we have an alias? We didn't order it. Where does he get such a "number" from?
PostgreSQL adds it itself. You just need to understand that just such an alias for us, for the purposes of comparison with the plan, it does not make any sense, it is simply added here. Let's not pay attention to him.
The second task "with an asterisk": if we are reading from a partitioned table, then we will get a node Append or Merge Append, which will consist of a large number of "children", and each of which will be some Scan'om from the table-section: Seq Scan, Bitmap Heap Scan or Index Scan. But, in any case, these “children” will not be complex queries - this is how these nodes can be distinguished from Append with UNION.
We also understand such knots, we collect them “in one pile” and say: “everything you read from megatable is right here and down the tree".
"Simple" Data Acquisition Nodes
Values Scan in plan corresponds VALUES in the request.
Result is a request without FROM sort of SELECT 1. Or when you have a false expression in WHERE-block (then the attribute appears One-Time Filter):
EXPLAIN ANALYZE
SELECT * FROM pg_class WHERE FALSE; -- или 0 = 1
Function Scan "Mapyatsya" on the SRF of the same name.
But with nested queries, everything is more complicated - unfortunately, they do not always turn into InitPlan/SubPlan. Sometimes they turn into ... Join or ... Anti Join, especially when you write something like WHERE NOT EXISTS .... And it’s not always possible to combine there - in the text of the plan there are no operators corresponding to the nodes of the plan.
Again task "with an asterisk": some VALUES in the request. In this case and in the plan you will get several nodes Values Scan.
"Number" suffixes will help to distinguish them from one another - it is added exactly in the order in which the corresponding VALUES-blocks in the course of the request from top to bottom.
Data processing
It seems that everything in our request was sorted out - only Limit.
But everything is simple here - such nodes as Limit, Sort, Aggregate, WindowAgg, Unique They “map” one-to-one to the corresponding operators in the request, if they are there. There are no "stars" and no difficulties.
JOIN
Difficulties arise when we want to combine JOIN between themselves. This is not always possible, but it is possible.
From the query parser's point of view, we have a node JoinExpr, which has exactly two children - left and right. This, respectively, is what is “above” your JOIN and what is written “under” it in the query.
And from the point of view of the plan, these are two descendants of some * Loop/* Join-node. Nested Loop, Hash Anti Join. . . is something like that.
Let's use simple logic: if we have tables A and B that "join" each other in the plan, then in the query they could be located either A-JOIN-Bor B-JOIN-A. Let's try to combine like this, try to combine in the opposite way, and so on until such pairs run out.
Take our syntax tree, take our plan, look at them... it doesn't look like it!
Let's redraw it in the form of graphs - oh, something has already become similar to something!
Let's notice that we have nodes that have children B and C at the same time - it doesn't matter to us in what order. Let's combine them and flip the picture of the node.
Let's see again. Now we have nodes with children A and pairs (B + C) - compatible with them.
Great! It turns out that we are these two JOIN from the query with the nodes of the plan were successfully combined.
Alas, this problem is not always solved.
For example, if the request A JOIN B JOIN C, and in the plan, the “extreme” nodes A and C were connected first of all. But there is no such operator in the query, we have nothing to highlight, there is nothing to bind the hint to. Same with "comma" when you write A, B.
But, in most cases, almost all nodes manage to “untie” and get such profiling on the left in time - literally, like in Google Chrome, when you analyze JavaScript code. You can see how long each line and each statement "executed".
And to make it easier for you to use all this, we have made storage the archive, where you can save and then find your plans along with associated queries or share a link with someone.
If you just need to bring an unreadable request into an adequate form, use our "normalizer".