PostgreSQL Antipatterns: "Infinity is not the limit!", or A little bit about recursion

Recursion - a very powerful and convenient mechanism if the same actions are performed β€œin depth” on related data. But uncontrolled recursion is an evil that can either lead to endless execution process, or (which happens more often) to consuming all available memory.

PostgreSQL Antipatterns: "Infinity is not the limit!", or A little bit about recursion
DBMS in this respect work on the same principles - "they said to dig, I dig". Your request can not only slow down neighboring processes, constantly taking up processor resources, but also "drop" the entire database as a whole, "eating" all available memory. Therefore infinite recursion protection is the responsibility of the developer.

In PostgreSQL, the ability to use recursive queries through WITH RECURSIVE appeared in ancient times of version 8.4, but you can still regularly meet potentially vulnerable β€œdefenseless” queries. How to save yourself from problems of this kind?

Don't write recursive queries

And write non-recursive. Sincerely, Your K.O.

In fact, PostgreSQL provides quite a lot of functionality that you can use to not apply recursion.

Use a fundamentally different approach to the problem

Sometimes you can just look at the problem "from the other side." I gave an example of such a situation in the article "SQL HowTo: 1000 and one way to aggregate" - multiplying a set of numbers without using custom aggregate functions:

WITH RECURSIVE src AS (
  SELECT '{2,3,5,7,11,13,17,19}'::integer[] arr
)
, T(i, val) AS (
  SELECT
    1::bigint
  , 1
UNION ALL
  SELECT
    i + 1
  , val * arr[i]
  FROM
    T
  , src
  WHERE
    i <= array_length(arr, 1)
)
SELECT
  val
FROM
  T
ORDER BY -- ΠΎΡ‚Π±ΠΎΡ€ Ρ„ΠΈΠ½Π°Π»ΡŒΠ½ΠΎΠ³ΠΎ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Π°
  i DESC
LIMIT 1;

Such a request can be replaced with a variant from experts in mathematics:

WITH src AS (
  SELECT unnest('{2,3,5,7,11,13,17,19}'::integer[]) prime
)
SELECT
  exp(sum(ln(prime)))::integer val
FROM
  src;

Use generate_series instead of loops

Suppose we are faced with the task of generating all possible prefixes for a string 'abcdefgh':

WITH RECURSIVE T AS (
  SELECT 'abcdefgh' str
UNION ALL
  SELECT
    substr(str, 1, length(str) - 1)
  FROM
    T
  WHERE
    length(str) > 1
)
TABLE T;

Do you really need recursion here? .. If you use LATERAL ΠΈ generate_series, then even CTEs are not needed:

SELECT
  substr(str, 1, ln) str
FROM
  (VALUES('abcdefgh')) T(str)
, LATERAL(
    SELECT generate_series(length(str), 1, -1) ln
  ) X;

Change the structure of the database

For example, you have a table of forum posts with links to who replied to whom or a thread in social network:

CREATE TABLE message(
  message_id
    uuid
      PRIMARY KEY
, reply_to
    uuid
      REFERENCES message
, body
    text
);
CREATE INDEX ON message(reply_to);

PostgreSQL Antipatterns: "Infinity is not the limit!", or A little bit about recursion
Well, a typical request to download all messages on one topic looks something like this:

WITH RECURSIVE T AS (
  SELECT
    *
  FROM
    message
  WHERE
    message_id = $1
UNION ALL
  SELECT
    m.*
  FROM
    T
  JOIN
    message m
      ON m.reply_to = T.message_id
)
TABLE T;

But since we always need the whole subject from the root message, why don't we add its id to each entry automatic?

-- Π΄ΠΎΠ±Π°Π²ΠΈΠΌ ΠΏΠΎΠ»Π΅ с ΠΎΠ±Ρ‰ΠΈΠΌ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ Ρ‚Π΅ΠΌΡ‹ ΠΈ индСкс Π½Π° Π½Π΅Π³ΠΎ
ALTER TABLE message
  ADD COLUMN theme_id uuid;
CREATE INDEX ON message(theme_id);

-- ΠΈΠ½ΠΈΡ†ΠΈΠ°Π»ΠΈΠ·ΠΈΡ€ΡƒΠ΅ΠΌ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠ°Ρ‚ΠΎΡ€ Ρ‚Π΅ΠΌΡ‹ Π² Ρ‚Ρ€ΠΈΠ³Π³Π΅Ρ€Π΅ ΠΏΡ€ΠΈ вставкС
CREATE OR REPLACE FUNCTION ins() RETURNS TRIGGER AS $$
BEGIN
  NEW.theme_id = CASE
    WHEN NEW.reply_to IS NULL THEN NEW.message_id -- Π±Π΅Ρ€Π΅ΠΌ ΠΈΠ· стартового события
    ELSE ( -- ΠΈΠ»ΠΈ ΠΈΠ· сообщСния, Π½Π° ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ΅ ΠΎΡ‚Π²Π΅Ρ‡Π°Π΅ΠΌ
      SELECT
        theme_id
      FROM
        message
      WHERE
        message_id = NEW.reply_to
    )
  END;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER ins BEFORE INSERT
  ON message
    FOR EACH ROW
      EXECUTE PROCEDURE ins();

PostgreSQL Antipatterns: "Infinity is not the limit!", or A little bit about recursion
Now our entire recursive query can be reduced to just this:

SELECT
  *
FROM
  message
WHERE
  theme_id = $1;

Use applied "limiters"

If we are unable to change the structure of the database for some reason, let's see what we can rely on so that even the presence of an error in the data does not lead to endless recursion.

Recursion "depth" counter

We simply increase the counter by one at each step of the recursion until the limit is reached, which we consider obviously inadequate:

WITH RECURSIVE T AS (
  SELECT
    0 i
  ...
UNION ALL
  SELECT
    i + 1
  ...
  WHERE
    T.i < 64 -- ΠΏΡ€Π΅Π΄Π΅Π»
)

Pro: When trying to loop, we will still make no more than the specified limit of iterations "in depth".
cons: There is no guarantee that we will not re-process the same record - for example, at a depth of 15 and 25, and then every +10. And no one promised anything about "breadth".

Formally, such a recursion will not be infinite, but if at each step the number of records increases exponentially, we all know well how it ends ...

PostgreSQL Antipatterns: "Infinity is not the limit!", or A little bit about recursionsee "The problem of grains on a chessboard"

Guardian of the Way

In turn, we add all the object identifiers we encountered along the recursion path to an array, which is a unique β€œpath” to it:

WITH RECURSIVE T AS (
  SELECT
    ARRAY[id] path
  ...
UNION ALL
  SELECT
    path || id
  ...
  WHERE
    id <> ALL(T.path) -- Π½Π΅ совпадаСт Π½ΠΈ с ΠΎΠ΄Π½ΠΈΠΌ ΠΈΠ·
)

Pro: If there is a cycle in the data, we absolutely will not process the same record repeatedly within the same path.
cons: But at the same time, we can bypass, literally, all the records without repeating ourselves.

PostgreSQL Antipatterns: "Infinity is not the limit!", or A little bit about recursionsee "Knight's move problem"

Path length limit

To avoid the situation of "wandering" recursion at an incomprehensible depth, we can combine the two previous methods. Or, if we do not want to support extra fields, supplement the recursion continuation condition with an estimate of the path length:

WITH RECURSIVE T AS (
  SELECT
    ARRAY[id] path
  ...
UNION ALL
  SELECT
    path || id
  ...
  WHERE
    id <> ALL(T.path) AND
    array_length(T.path, 1) < 10
)

Choose the way you like!

Source: habr.com

Add a comment