PostgreSQL Antipatterns: Harmful JOINs and ORs

Beware of operations that bring buffers ...
Using the example of a small query, let's consider some universal approaches to query optimization in PostgreSQL. Whether or not you use them is up to you, but it's worth knowing about them.

In some subsequent versions of PG, the situation may change with the β€œsmarter” of the scheduler, but for 9.4 / 9.6 it looks about the same, as examples here.

Let's take a very real request:

SELECT
  TRUE
FROM
  "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" d
INNER JOIN
  "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π Π°ΡΡˆΠΈΡ€Π΅Π½ΠΈΠ΅" doc_ex
    USING("@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚")
INNER JOIN
  "Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" t_doc ON
    t_doc."@Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" = d."Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°"
WHERE
  (d."Π›ΠΈΡ†ΠΎ3" = 19091 or d."Π‘ΠΎΡ‚Ρ€ΡƒΠ΄Π½ΠΈΠΊ" = 19091) AND
  d."$Π§Π΅Ρ€Π½ΠΎΠ²ΠΈΠΊ" IS NULL AND
  d."Π£Π΄Π°Π»Π΅Π½" IS NOT TRUE AND
  doc_ex."БостояниС"[1] IS TRUE AND
  t_doc."Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" = 'ΠŸΠ»Π°Π½Π Π°Π±ΠΎΡ‚'
LIMIT 1;

about table and field namesThe "Russian" names of fields and tables can be treated differently, but this is a matter of taste. Because the in our Tensor there are no foreign developers, and PostgreSQL allows us to give names even in hieroglyphs, if they enclosed in quotation marks, then we prefer to name objects unambiguously, so that there are no discrepancies.
Let's look at the resulting plan:
PostgreSQL Antipatterns: Harmful JOINs and ORs
[look at explain.tensor.ru]

144ms and almost 53K buffers - that is, more than 400MB of data! And we will be lucky if all of them will be in the cache by the time of our request, otherwise it will become many times longer when subtracting from disk.

The algorithm matters the most!

To somehow optimize any query, you must first understand what it should do in general.
Let's leave the development of the database structure itself outside the scope of this article for now, and agree that we can relatively "cheaply" rewrite query and / or roll into the base some we need Indexes.

So the request is:
- checks for the existence of at least some document
- in the condition we need and of a certain type
- where the author or performer is the employee we need

JOIN + LIMIT 1

Quite often, it is easier for a developer to write a query, where a large number of tables are first joined, and then only one record remains from all this set. But easier for the developer does not mean more efficient for the database.
In our case, there were only 3 tables - and what effect ...

Let's first get rid of the connection with the "DocumentType" table, and at the same time tell the database that type record is unique (we know this, but the scheduler does not guess yet):

WITH T AS (
  SELECT
    "@Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°"
  FROM
    "Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°"
  WHERE
    "Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" = 'ΠŸΠ»Π°Π½Π Π°Π±ΠΎΡ‚'
  LIMIT 1
)
...
WHERE
  d."Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" = (TABLE T)
...

Yes, if a table/CTE consists of a single field of a single record, then in PG you can even write like this, instead of

d."Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" = (SELECT "@Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" FROM T LIMIT 1)

Lazy evaluation in PostgreSQL queries

BitmapOr vs UNION

In some cases, Bitmap Heap Scan will cost us a lot - for example, in our situation, when enough records fall under the required condition. We got it because OR condition turned into BitmapOr- operation in the plan.
Let's return to the original problem - we need to find a record corresponding to to any from the conditions - that is, there is no need to search for all 59K records for both conditions. There is a way to work out one condition, and go to the second only when nothing was found for the first. The following structure will help us:

(
  SELECT
    ...
  LIMIT 1
)
UNION ALL
(
  SELECT
    ...
  LIMIT 1
)
LIMIT 1

The "external" LIMIT 1 guarantees that the search will end when the first record is found. And if it is already in the first block, the second block will not be executed (never executed in respect of).

β€œHiding under CASE” difficult conditions

There is an extremely inconvenient moment in the original request - checking the state against the related table "DocumentExtension". Regardless of the truth of the other conditions in the expression (for example, d."Deleted" IS NOT TRUE), this connection is always made and "costs resources". More or less of them will be spent - depends on the size of this table.
But you can modify the query so that the search for a related record occurs only when it is really necessary:

SELECT
  ...
FROM
  "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" d
WHERE
  ... /*index cond*/ AND
  CASE
    WHEN "$Π§Π΅Ρ€Π½ΠΎΠ²ΠΈΠΊ" IS NULL AND "Π£Π΄Π°Π»Π΅Π½" IS NOT TRUE THEN (
      SELECT
        "БостояниС"[1] IS TRUE
      FROM
        "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π Π°ΡΡˆΠΈΡ€Π΅Π½ΠΈΠ΅"
      WHERE
        "@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" = d."@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚"
    )
  END

Once from the linked table we none of the fields are needed for the result, then we have the opportunity to turn JOIN into a condition on a subquery.
Let's leave the indexed fields β€œoutside the brackets” CASE, add simple conditions from the record to the WHEN block - and now the β€œheavy” query is executed only when switching to THEN.

My last name is Total

We collect the resulting query with all the mechanics described above:

WITH T AS (
  SELECT
    "@Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°"
  FROM
    "Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°"
  WHERE
    "Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°" = 'ΠŸΠ»Π°Π½Π Π°Π±ΠΎΡ‚'
)
  (
    SELECT
      TRUE
    FROM
      "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" d
    WHERE
      ("Π›ΠΈΡ†ΠΎ3", "Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°") = (19091, (TABLE T)) AND
      CASE
        WHEN "$Π§Π΅Ρ€Π½ΠΎΠ²ΠΈΠΊ" IS NULL AND "Π£Π΄Π°Π»Π΅Π½" IS NOT TRUE THEN (
          SELECT
            "БостояниС"[1] IS TRUE
          FROM
            "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π Π°ΡΡˆΠΈΡ€Π΅Π½ΠΈΠ΅"
          WHERE
            "@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" = d."@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚"
        )
      END
    LIMIT 1
  )
UNION ALL
  (
    SELECT
      TRUE
    FROM
      "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" d
    WHERE
      ("Π’ΠΈΠΏΠ”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°", "Π‘ΠΎΡ‚Ρ€ΡƒΠ΄Π½ΠΈΠΊ") = ((TABLE T), 19091) AND
      CASE
        WHEN "$Π§Π΅Ρ€Π½ΠΎΠ²ΠΈΠΊ" IS NULL AND "Π£Π΄Π°Π»Π΅Π½" IS NOT TRUE THEN (
          SELECT
            "БостояниС"[1] IS TRUE
          FROM
            "Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π Π°ΡΡˆΠΈΡ€Π΅Π½ΠΈΠ΅"
          WHERE
            "@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚" = d."@Π”ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚"
        )
      END
    LIMIT 1
  )
LIMIT 1;

Fitting [under] indices

A trained eye noticed that the indexed conditions in the UNION subblocks are slightly different - this is because we already have suitable indexes on the table. And if they didn’t exist, then it would be worth creating: Document(Person3, DocumentType) ΠΈ Document(DocumentType, Employee).
about the order of fields in ROW conditionsFrom the scheduler's point of view, of course, one can also write (A, B) = (constA, constB)and (B, A) = (constB, constA). But when writing in the order of the fields in the index, such a request is simply more convenient to debug later.
What's in the plan?
PostgreSQL Antipatterns: Harmful JOINs and ORs
[look at explain.tensor.ru]

Unfortunately, we were not lucky, and nothing was found in the first UNION block, so the second one still went to execution. But even so, all 0.037ms and 11 buffers!
We accelerated the request and reduced the "pumping" of data in memory several thousand times, using fairly simple techniques - a good result with a little copy-paste. πŸ™‚

Source: habr.com

Add a comment