What EXPLAIN is silent about, and how to talk to him

The classic question that a developer asks their DBA or a business owner asks a PostgreSQL consultant almost always sounds the same: "Why are queries running on the database taking so long?"

Traditional set of reasons:

  • inefficient algorithm
    when you decide to JOIN several CTEs over a couple of tens of thousands of records
  • outdated statistics
    if the actual distribution of data in the table is already very different from the last time collected by ANALYZE
  • "plugging" in resources
    and there is no longer enough dedicated computing power of the CPU, gigabytes of memory are constantly pumped over or the disk does not keep up with all the "Wishlist" of the database
  • blocking from competing processes

And if locks are difficult enough to catch and analyze, then for everything else we need query plan, which can be obtained using EXPLAIN statement (better, of course, immediately EXPLAIN (ANALYZE, BUFFERS) ...) Or auto_explain module.

But as stated in the same documentation,

β€œUnderstanding the plan is an art, and in order to master it, a certain experience is needed, ...”

But you can do without it if you use the right tool!

What does a query plan usually look like? Something like that:

Index Scan using pg_class_relname_nsp_index on pg_class (actual time=0.049..0.050 rows=1 loops=1)
  Index Cond: (relname = $1)
  Filter: (oid = $0)
  Buffers: shared hit=4
  InitPlan 1 (returns $0,$1)
    ->  Limit (actual time=0.019..0.020 rows=1 loops=1)
          Buffers: shared hit=1
          ->  Seq Scan on pg_class pg_class_1 (actual time=0.015..0.015 rows=1 loops=1)
                Filter: (relkind = 'r'::"char")
                Rows Removed by Filter: 5
                Buffers: shared hit=1

or like this:

"Append  (cost=868.60..878.95 rows=2 width=233) (actual time=0.024..0.144 rows=2 loops=1)"
"  Buffers: shared hit=3"
"  CTE cl"
"    ->  Seq Scan on pg_class  (cost=0.00..868.60 rows=9972 width=537) (actual time=0.016..0.042 rows=101 loops=1)"
"          Buffers: shared hit=3"
"  ->  Limit  (cost=0.00..0.10 rows=1 width=233) (actual time=0.023..0.024 rows=1 loops=1)"
"        Buffers: shared hit=1"
"        ->  CTE Scan on cl  (cost=0.00..997.20 rows=9972 width=233) (actual time=0.021..0.021 rows=1 loops=1)"
"              Buffers: shared hit=1"
"  ->  Limit  (cost=10.00..10.10 rows=1 width=233) (actual time=0.117..0.118 rows=1 loops=1)"
"        Buffers: shared hit=2"
"        ->  CTE Scan on cl cl_1  (cost=0.00..997.20 rows=9972 width=233) (actual time=0.001..0.104 rows=101 loops=1)"
"              Buffers: shared hit=2"
"Planning Time: 0.634 ms"
"Execution Time: 0.248 ms"

But to read the plan in text "from a sheet" is very difficult and impractical:

  • in the node is displayed sum over subtree resources
    that is, to understand how much time it took to execute a particular node, or how much exactly this reading from the table raised data from the disk, you need to somehow subtract one from the other
  • node time needed multiply by loops
    yes, subtraction is not the most difficult operation to be done β€œin the mind” - after all, the execution time is indicated as an average for one node execution, and there can be hundreds of them
  • well, and all this together makes it difficult to answer the main question - so who "weakest link"?

When we tried to explain all this to several hundred of our developers, we realized that from the outside it looks something like this:

What EXPLAIN is silent about, and how to talk to him

And that means we need...

Tool

In it, we tried to collect all the key mechanics that help, according to the plan and request, to understand "who is to blame and what to do." Well, share some of your experience with the community.
Meet and enjoy - explain.tensor.ru

Visibility of plans

Is it easy to understand the plan when it looks like this?

Seq Scan on pg_class (actual time=0.009..1.304 rows=6609 loops=1)
  Buffers: shared hit=263
Planning Time: 0.108 ms
Execution Time: 1.800 ms

Not really.

But like this in abbreviated form, when the key indicators are separated, it is already much clearer:

What EXPLAIN is silent about, and how to talk to him

But if the plan is more complicated, it will come to the rescue time distribution piechart by nodes:

What EXPLAIN is silent about, and how to talk to him

Well, for the most difficult options in a hurry to help execution chart:

What EXPLAIN is silent about, and how to talk to him

For example, there are rather non-trivial situations when a plan can have more than one actual root:

What EXPLAIN is silent about, and how to talk to himWhat EXPLAIN is silent about, and how to talk to him

Structural clues

Well, if the whole structure of the plan and its sore spots are already laid out and visible - why not highlight them to the developer and explain them in β€œRussian language”?

What EXPLAIN is silent about, and how to talk to himWe have already collected a couple of dozens of such templates of recommendations.

Row Query Profiler

Now, if the original query is superimposed on the analyzed plan, then you can see how much time it took for each individual operator - something like this:

What EXPLAIN is silent about, and how to talk to him

... or even like this:

What EXPLAIN is silent about, and how to talk to him

Substitution of parameters in a request

If you β€œhooked” to the plan not only the request, but also its parameters from the DETAIL line of the log, you can copy it additionally in one of the options:

  • with substitution of values ​​in the query
    for direct execution on your base and further profiling

    SELECT 'const', 'param'::text;
  • with value substitution via PREPARE/EXECUTE
    to emulate the work of the scheduler, when the parametric part can be ignored - for example, when working on partitioned tables

    DEALLOCATE ALL;
    PREPARE q(text) AS SELECT 'const', $1::text;
    EXECUTE q('param'::text);
    

Plans archive

Insert, analyze, share with colleagues! The plans will remain archived and you can return to them later: explain.tensor.ru/archive

But if you do not want others to see your plan, do not forget to check the box "do not publish in the archive."

In the following articles, I will talk about the difficulties and decisions that arise when analyzing a plan.

Source: habr.com

Add a comment