SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"

Ngamaxesha athile, umsebenzi wokukhangela idatha ehambelanayo usebenzisa isethi yezitshixo ivela. de sifumane inani elifunekayo lilonke leerekhodi.

Owona mzekelo β€œwobomi benene” kukubonisa 20 iingxaki ezindala, zidweliswe kuluhlu lwabasebenzi (umzekelo, phakathi kwecandelo elinye). Kulawulo olwahlukeneyo β€œlweedeshibhodi” ezinezishwankathelo ezimfutshane zeendawo zokusebenza, isihloko esifanayo siyafuneka rhoqo.

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"

Kweli nqaku siza kujonga ukuphunyezwa kwe-PostgreSQL yesisombululo "sokungenangqondo" kwingxaki enjalo, "i-smarter" kunye ne-algorithm enzima kakhulu. "iluphu" kwiSQL enemeko yokuphuma kwidatha efunyenweyo, enokuba luncedo kokubini kuphuhliso jikelele kunye nokusetyenziswa kwezinye iimeko ezifanayo.

Masithathe isethi yedatha yovavanyo ukusuka inqaku elidlulileyo. Ukuthintela iirekhodi ezibonisiweyo ukuba "zitsibe" amaxesha ngamaxesha xa amaxabiso ahleliweyo ehambelana, yandisa isalathiso sesihloko ngokongeza iqhosha eliphambili. Kwangaxeshanye, oku kuya kuyinika ngokukhawuleza okungafaniyo kwaye kusiqinisekise ukuba ulungelelwaniso lokuhlela lucacile:

CREATE INDEX ON task(owner_id, task_date, id);
-- Π° старый - ΡƒΠ΄Π°Π»ΠΈΠΌ
DROP INDEX task_owner_id_task_date_idx;

njengoko liviwe, ngokunjalo kubhaliwe kwathiwa;

Okokuqala, makhe sizobe olona guqulelo lulula lwesicelo, sigqithise ii-ID zabadlali uluhlu njengeparamitha yongeniso:

SELECT
  *
FROM
  task
WHERE
  owner_id = ANY('{1,2,4,8,16,32,64,128,256,512}'::integer[])
ORDER BY
  task_date, id
LIMIT 20;

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"
[Jonga apha explain.tensor.ru]

Okubuhlungu kancinci - si-odole iirekhodi ezingama-20 kuphela, kodwa i-Index Scan isibuyisele kuthi 960 imigca, leyo ke nayo kwakufuneka ihlelwe... Masizame ukufunda kancinci.

unnest + ARRAY

Ingqwalaselo yokuqala eya kusinceda kuxa sifuna ngama-20 kuphela ahleliweyo iirekhodi, uze ufunde nje akukho ngaphezu kwama-20 ahlelwe ngokulandelelana okufanayo kwinto nganye isitshixo. Kulungile, isalathisi esifanelekileyo (owner_id, task_date, id) sinayo.

Masisebenzise indlela efanayo yokukhupha kunye "nokusasaza kwiikholamu" irekhodi yetheyibhile edibeneyo, njengoba inqaku lokugqibela. Sinokusebenzisa ukusonga kuluhlu sisebenzisa umsebenzi ARRAY():

WITH T AS (
  SELECT
    unnest(ARRAY(
      SELECT
        t
      FROM
        task t
      WHERE
        owner_id = unnest
      ORDER BY
        task_date, id
      LIMIT 20 -- ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡ΠΈΠ²Π°Π΅ΠΌ Ρ‚ΡƒΡ‚...
    )) r
  FROM
    unnest('{1,2,4,8,16,32,64,128,256,512}'::integer[])
)
SELECT
  (r).*
FROM
  T
ORDER BY
  (r).task_date, (r).id
LIMIT 20; -- ... ΠΈ Ρ‚ΡƒΡ‚ - Ρ‚ΠΎΠΆΠ΅

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"
[Jonga apha explain.tensor.ru]

Owu, ngcono kakhulu! I-40% ngokukhawuleza kunye namaxesha e-4.5 ngaphantsi kwedatha Kwafuneka ndiyifunde.

Ukusetyenziswa kweerekhodi zetafile nge-CTEManditsalele ingqalelo yakho kwinto yokuba kwezinye iimeko Inzame yokusebenza ngokukhawuleza kunye nemimandla yerekhodi emva kokuyikhangela kwi-subquery, ngaphandle "kokuyisonga" kwi-CTE, kunokukhokelela ekubeni "phinda-phinda" InitPlan ngokomlinganiselo wenani lale mihlaba inye:

SELECT
  ((
    SELECT
      t
    FROM
      task t
    WHERE
      owner_id = 1
    ORDER BY
      task_date, id
    LIMIT 1
  ).*);

Result  (cost=4.77..4.78 rows=1 width=16) (actual time=0.063..0.063 rows=1 loops=1)
  Buffers: shared hit=16
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.031..0.032 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t  (cost=0.42..387.57 rows=500 width=48) (actual time=0.030..0.030 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4
  InitPlan 2 (returns $1)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.008..0.009 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t_1  (cost=0.42..387.57 rows=500 width=48) (actual time=0.008..0.008 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4
  InitPlan 3 (returns $2)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.008..0.008 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t_2  (cost=0.42..387.57 rows=500 width=48) (actual time=0.008..0.008 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4"
  InitPlan 4 (returns $3)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.009..0.009 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t_3  (cost=0.42..387.57 rows=500 width=48) (actual time=0.009..0.009 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4

Irekhodi efanayo "yajongwa phezulu" ngamaxesha e-4 ... Kuze kube yi-PostgreSQL 11, oku kuziphatha kwenzeka rhoqo, kwaye isisombululo "kukuyisonga" kwi-CTE, engumda opheleleyo we-optimizer kwezi nguqulelo.

I-recursive accumulator

Kwinguqulelo yangaphambili, xa sisonke sifunda 200 imigca ngenxa efunekayo 20. Hayi 960, kodwa nangaphantsi - ngaba kunokwenzeka?

Masizame ukusebenzisa ulwazi esiludingayo 20 iyonke iirekhodi. Oko kukuthi, siya kuphindaphinda ukufundwa kwedatha kuphela de sifikelele kwisixa esisidingayo.

Inyathelo 1: Uluhlu lokuqalisa

Ngokucacileyo, uluhlu lwethu "olujoliswe kuko" lweerekhodi ezingama-20 kufuneka luqale ngeerekhodi "zokuqala" kwesinye sezitshixo zethu ze-ident_id. Ngoko ke, okokuqala siya kufumana ezinjalo β€œkuqala kakhulu” kwisitshixo ngasinye kwaye uyongeze kuluhlu, ukuyibeka ngendlela esiyifunayo - (umhla womsebenzi, id).

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"

Inyathelo lesi-2: Fumana ingeniso "elandelayo".

Ngoku ukuba sithatha ingeniso yokuqala kuluhlu lwethu kwaye siqale β€œnyathelo” ngakumbi ecaleni kwesalathiso Ukugcina umnikazi_id isitshixo, emva koko zonke iirekhodi ezifunyenweyo zizona zilandelayo kukhetho olunesiphumo. Ewe, kuphela side siwele isitshixo sempundu ungeno lwesibini kuluhlu.

Ukuba kuvela ukuba "siwele" irekhodi yesibini, ngoko ungeno lokugqibela olufundiweyo kufuneka longezwe kuluhlu endaweni yelokuqala (nge owner_id efanayo), emva koko siphinde sihlele uluhlu kwakhona.

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"

Oko kukuthi, sihlala sifumana ukuba uluhlu alunangeniso engaphezulu kwesinye kwisitshixo ngasinye (ukuba amangeno ayaphela kwaye "asiweleli", ngoko ke ungeniso lokuqala oluvela kuluhlu luya kunyamalala kwaye akukho nto iya kongezwa. ), kunye nabo yahlala yahlelwa ngokonyuka kolandelelwano lweqhosha lesicelo (task_date, id).

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"

Inyathelo 3: icebo lokucoca kunye "nokwandisa" iirekhodi

Kweminye yemigca yokhetho lwethu oluphindaphindiweyo, ezinye iirekhodi rv ziphindaphindwe - okokuqala sifumana "njengokuwela umda wokungena kwe-2 kuluhlu", kwaye ke uyifake endaweni yoku-1 kuluhlu. Ngoko isehlo sokuqala kufuneka sihluzwe.

Umbuzo wokugqibela owoyikayo

WITH RECURSIVE T AS (
  -- #1 : заносим Π² список "ΠΏΠ΅Ρ€Π²Ρ‹Π΅" записи ΠΏΠΎ ΠΊΠ°ΠΆΠ΄ΠΎΠΌΡƒ ΠΈΠ· ΠΊΠ»ΡŽΡ‡Π΅ΠΉ Π½Π°Π±ΠΎΡ€Π°
  WITH wrap AS ( -- "ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΠΈΠ·ΡƒΠ΅ΠΌ" record'Ρ‹, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΎΠ±Ρ€Π°Ρ‰Π΅Π½ΠΈΠ΅ ΠΊ полям Π½Π΅ Π²Ρ‹Π·Ρ‹Π²Π°Π»ΠΎ умноТСния InitPlan/SubPlan
    WITH T AS (
      SELECT
        (
          SELECT
            r
          FROM
            task r
          WHERE
            owner_id = unnest
          ORDER BY
            task_date, id
          LIMIT 1
        ) r
      FROM
        unnest('{1,2,4,8,16,32,64,128,256,512}'::integer[])
    )
    SELECT
      array_agg(r ORDER BY (r).task_date, (r).id) list -- сортируСм список Π² Π½ΡƒΠΆΠ½ΠΎΠΌ порядкС
    FROM
      T
  )
  SELECT
    list
  , list[1] rv
  , FALSE not_cross
  , 0 size
  FROM
    wrap
UNION ALL
  -- #2 : Π²Ρ‹Ρ‡ΠΈΡ‚Ρ‹Π²Π°Π΅ΠΌ записи 1-Π³ΠΎ ΠΏΠΎ порядку ΠΊΠ»ΡŽΡ‡Π°, ΠΏΠΎΠΊΠ° Π½Π΅ ΠΏΠ΅Ρ€Π΅ΡˆΠ°Π³Π½Π΅ΠΌ Ρ‡Π΅Ρ€Π΅Π· запись 2-Π³ΠΎ
  SELECT
    CASE
      -- Ссли Π½ΠΈΡ‡Π΅Π³ΠΎ Π½Π΅ Π½Π°ΠΉΠ΄Π΅Π½ΠΎ для ΠΊΠ»ΡŽΡ‡Π° 1-ΠΉ записи
      WHEN X._r IS NOT DISTINCT FROM NULL THEN
        T.list[2:] -- ΡƒΠ±ΠΈΡ€Π°Π΅ΠΌ Π΅Π΅ ΠΈΠ· списка
      -- Ссли ΠΌΡ‹ НЕ пСрСсСкли ΠΏΡ€ΠΈΠΊΠ»Π°Π΄Π½ΠΎΠΉ ΠΊΠ»ΡŽΡ‡ 2-ΠΉ записи
      WHEN X.not_cross THEN
        T.list -- просто протягиваСм Ρ‚ΠΎΡ‚ ΠΆΠ΅ список Π±Π΅Π· ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΉ
      -- Ссли Π² спискС ΡƒΠΆΠ΅ Π½Π΅Ρ‚ 2-ΠΉ записи
      WHEN T.list[2] IS NULL THEN
        -- просто Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅ΠΌ пустой список
        '{}'
      -- пСрСсортировываСм ΡΠ»ΠΎΠ²Π°Ρ€ΡŒ, убирая 1-ю запись ΠΈ добавляя послСднюю ΠΈΠ· Π½Π°ΠΉΠ΄Π΅Π½Π½Ρ‹Ρ…
      ELSE (
        SELECT
          coalesce(T.list[2] || array_agg(r ORDER BY (r).task_date, (r).id), '{}')
        FROM
          unnest(T.list[3:] || X._r) r
      )
    END
  , X._r
  , X.not_cross
  , T.size + X.not_cross::integer
  FROM
    T
  , LATERAL(
      WITH wrap AS ( -- "ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΠΈΠ·ΡƒΠ΅ΠΌ" record
        SELECT
          CASE
            -- Ссли всС-Ρ‚Π°ΠΊΠΈ "ΠΏΠ΅Ρ€Π΅ΡˆΠ°Π³Π½ΡƒΠ»ΠΈ" Ρ‡Π΅Ρ€Π΅Π· 2-ю запись
            WHEN NOT T.not_cross
              -- Ρ‚ΠΎ нуТная запись - пСрвая ΠΈΠ· спписка
              THEN T.list[1]
            ELSE ( -- Ссли Π½Π΅ пСрСсСкли, Ρ‚ΠΎ ΠΊΠ»ΡŽΡ‡ остался ΠΊΠ°ΠΊ Π² ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰Π΅ΠΉ записи - отталкиваСмся ΠΎΡ‚ Π½Π΅Π΅
              SELECT
                _r
              FROM
                task _r
              WHERE
                owner_id = (rv).owner_id AND
                (task_date, id) > ((rv).task_date, (rv).id)
              ORDER BY
                task_date, id
              LIMIT 1
            )
          END _r
      )
      SELECT
        _r
      , CASE
          -- Ссли 2-ΠΉ записи ΡƒΠΆΠ΅ Π½Π΅Ρ‚ Π² спискС, Π½ΠΎ ΠΌΡ‹ Ρ…ΠΎΡ‚ΡŒ Ρ‡Ρ‚ΠΎ-Ρ‚ΠΎ нашли
          WHEN list[2] IS NULL AND _r IS DISTINCT FROM NULL THEN
            TRUE
          ELSE -- Π½ΠΈΡ‡Π΅Π³ΠΎ Π½Π΅ нашли ΠΈΠ»ΠΈ "ΠΏΠ΅Ρ€Π΅ΡˆΠ°Π³Π½ΡƒΠ»ΠΈ"
            coalesce(((_r).task_date, (_r).id) < ((list[2]).task_date, (list[2]).id), FALSE)
        END not_cross
      FROM
        wrap
    ) X
  WHERE
    T.size < 20 AND -- ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡ΠΈΠ²Π°Π΅ΠΌ Ρ‚ΡƒΡ‚ количСство
    T.list IS DISTINCT FROM '{}' -- ΠΈΠ»ΠΈ ΠΏΠΎΠΊΠ° список Π½Π΅ кончился
)
-- #3 : "Ρ€Π°Π·Π²ΠΎΡ€Π°Ρ‡ΠΈΠ²Π°Π΅ΠΌ" записи - порядок Π³Π°Ρ€Π°Π½Ρ‚ΠΈΡ€ΠΎΠ²Π°Π½ ΠΏΠΎ ΠΏΠΎΡΡ‚Ρ€ΠΎΠ΅Π½ΠΈΡŽ
SELECT
  (rv).*
FROM
  T
WHERE
  not_cross; -- Π±Π΅Ρ€Π΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ "Π½Π΅ΠΏΠ΅Ρ€Π΅ΡΠ΅ΠΊΠ°ΡŽΡ‰ΠΈΠ΅" записи

SQL HowTo: bhala ixesha-loop ngokuthe ngqo kumbuzo, okanye "Elementary three-way"
[Jonga apha explain.tensor.ru]

Ngoko ke, thina kuthengiswa i-50% yedatha efundwa kwi-20% yexesha lokubulawa. Oko kukuthi, ukuba unezizathu zokukholelwa ukuba ukufunda kunokuthatha ixesha elide (umzekelo, idatha ihlala ingekho kwi-cache, kwaye kufuneka uye kwidisk kuyo), ngoko ngale ndlela unokuxhomekeka kancinci ekufundeni. .

Kwimeko nayiphi na into, ixesha lokuphumeza liye laba ngcono kunendlela yokuqala "engenangqondo". Kodwa yeyiphi kwezi 3 iinketho zokusebenzisa ixhomekeke kuwe.

umthombo: www.habr.com

Yongeza izimvo