SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"

Marmar, hawsha raadinta xogta la xidhiidha iyadoo la adeegsanayo furayaal ayaa soo baxda. ilaa aan ka helno tirada guud ee diiwaanada loo baahan yahay.

Tusaalaha "nolosha dhabta ah" ee ugu badan waa in la muujiyo 20-ka dhibaato ee ugu da'da weyn, ku taxan liiska shaqaalaha (tusaale ahaan, hal qayb gudahood). Maaraynta kala duwan ee "dashboards" oo leh koobab kooban oo meelaha shaqada ah, mawduuc la mid ah ayaa loo baahan yahay marar badan.

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"

Maqaalkan waxaan ku eegi doonaa hirgelinta PostgreSQL ee xalinta "naive" ee dhibaatadan oo kale, "ka caqli badan" iyo algorithm aad u adag. "loop" gudaha SQL oo leh xaalad ka bixitaan xogta la helay, kaas oo waxtar u yeelan kara horumarka guud iyo isticmaalka xaaladaha kale ee la midka ah labadaba.

Aynu ka soo qaadanno xogta tijaabada ah maqaal hore. Si looga hortago in diiwaannada la soo bandhigay ay "boodaan" waqti ka waqti marka qiyamka la soocay ay ku beegan yihiin, balaadhi tusmada mawduuca adiga oo ku daraya furaha aasaasiga ah. Isla mar ahaantaana, tani waxay isla markiiba siin doontaa mid gaar ah waxayna noo dammaanad qaadaysaa in nidaamka kala-soocidda uu yahay mid aan mugdi ku jirin:

CREATE INDEX ON task(owner_id, task_date, id);
-- Π° старый - ΡƒΠ΄Π°Π»ΠΈΠΌ
DROP INDEX task_owner_id_task_date_idx;

Sida loo maqlo ayaa loo qoray

Marka hore, aan sawirno nooca ugu fudud ee codsiga, anagoo gudbinayna aqoonsiga hawlwadeenada u habeeyey sida cabbirka wax-gelinta:

SELECT
  *
FROM
  task
WHERE
  owner_id = ANY('{1,2,4,8,16,32,64,128,256,512}'::integer[])
ORDER BY
  task_date, id
LIMIT 20;

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"
[fiiri sharaxaad.tensor.ru]

In yar oo murugo leh - waxaanu dalbanay kaliya 20 diiwaan, laakiin Index Scan ayaa nagu soo celiyay 960 sadar, kaas oo sidoo kale lahaa in la kala saaro... Aan isku dayno inaan wax yar akhrino.

unnest + ARRAY

Tixgelinta ugu horreysa ee ina caawin doonta waa haddii aan u baahanahay kaliya 20 la soocay diiwaanada, ka dibna kaliya akhri wax aan ka badnayn 20 ayaa loo kala soocay si isku mid ah mid kasta furaha. Wacan index ku habboon (owner_id, task_date, id) waan haynaa.

Aynu isticmaalno hab isku mid ah soo saarista iyo "ku faafinta tiirarka" diiwaanka miiska aasaasiga ah, sida ku jirta maqaalkii ugu dambeeyay. Waxaan sidoo kale codsan karnaa isku-laabashada array anagoo adeegsanayna shaqada ARRAY():

WITH T AS (
  SELECT
    unnest(ARRAY(
      SELECT
        t
      FROM
        task t
      WHERE
        owner_id = unnest
      ORDER BY
        task_date, id
      LIMIT 20 -- ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡ΠΈΠ²Π°Π΅ΠΌ Ρ‚ΡƒΡ‚...
    )) r
  FROM
    unnest('{1,2,4,8,16,32,64,128,256,512}'::integer[])
)
SELECT
  (r).*
FROM
  T
ORDER BY
  (r).task_date, (r).id
LIMIT 20; -- ... ΠΈ Ρ‚ΡƒΡ‚ - Ρ‚ΠΎΠΆΠ΅

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"
[fiiri sharaxaad.tensor.ru]

Oh, mar hore aad uga wanaagsan! 40% dhakhso badan iyo 4.5 jeer ka yar xogta Waxay ahayd inaan akhriyo.

Qalabaynta diiwaannada miiska iyada oo loo marayo CTEAan ku soo jiido dareenkaaga xaqiiqda xaaladaha qaarkood Isku dayga in isla markiiba lagala shaqeeyo goobaha diiwaanka ka dib markaad ka raadiso subquery, iyada oo aan "ku duubin" CTE, waxay horseedi kartaa "ku dhufo" InitPlan oo u dhiganta tirada goobahan:

SELECT
  ((
    SELECT
      t
    FROM
      task t
    WHERE
      owner_id = 1
    ORDER BY
      task_date, id
    LIMIT 1
  ).*);

Result  (cost=4.77..4.78 rows=1 width=16) (actual time=0.063..0.063 rows=1 loops=1)
  Buffers: shared hit=16
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.031..0.032 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t  (cost=0.42..387.57 rows=500 width=48) (actual time=0.030..0.030 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4
  InitPlan 2 (returns $1)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.008..0.009 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t_1  (cost=0.42..387.57 rows=500 width=48) (actual time=0.008..0.008 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4
  InitPlan 3 (returns $2)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.008..0.008 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t_2  (cost=0.42..387.57 rows=500 width=48) (actual time=0.008..0.008 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4"
  InitPlan 4 (returns $3)
    ->  Limit  (cost=0.42..1.19 rows=1 width=48) (actual time=0.009..0.009 rows=1 loops=1)
          Buffers: shared hit=4
          ->  Index Scan using task_owner_id_task_date_id_idx on task t_3  (cost=0.42..387.57 rows=500 width=48) (actual time=0.009..0.009 rows=1 loops=1)
                Index Cond: (owner_id = 1)
                Buffers: shared hit=4

Isla diiwaanka ayaa "kor loo eegay" 4 jeer... Ilaa PostgreSQL 11, habdhaqankani si joogto ah ayuu u dhacaa, xalkuna waa in lagu "duubo" CTE, taas oo xad buuxda u ah hagaajinta noocyadan.

Accumulator soo noqnoqda

In version hore, guud ahaan waanu akhrinay 200 sadar danta loo baahan yahay 20. Ma aha 960, laakiin xataa ka yar - suurto gal ah?

Aan isku dayno inaan ka faa'ideyno aqoonta aan u baahanahay wadarta 20 diiwaanada. Taasi waa, waxaanu ku celcelin doonaa akhrinta xogta kaliya ilaa aan ka gaarno qadarka aan u baahanahay.

Talaabada 1: Liiska Bilaabida

Sida iska cad, liiskayaga "bartilmaameedka" ee 20ka diiwaan waa inay ku bilowdaan diiwaanada "koowaad" ee mid ka mid ah furayaashayada_id. Sidaa darteed, marka hore waxaan heli doonaa sida "aad u horraysa" mid kasta oo ka mid ah furayaasha oo ku dar liiska, una kala sooc habka aan rabno - (hawl_date, id).

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"

Talaabada 2: Soo hel gelinta "xiga".

Hadda haddii aan ka soo qaadanno gelitaanka ugu horreeya liiskayaga oo aan bilowno "Tallaabo" oo ka sii socota tusmada ilaalinta furaha owner_id, ka dib dhammaan diiwaanada la helay waa kuwa ku xiga xulashada natiijada. Dabcan, kaliya ilaa aan ka gudubno furaha dabada gelitaanka labaad ee liiska.

Haddii ay soo baxdo inaan "ka gudubnay" rikoodhka labaad, markaa akhrinta u dambaysa waa in lagu daraa liiska halkii lagu dari lahaa kii hore (oo leh isla owner_id), ka dib waxaan dib u kala soocnay liiska mar kale.

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"

Taasi waa, waxaan had iyo jeer helnaa in liisku uusan lahayn wax ka badan hal gelitaan mid kasta oo ka mid ah furayaasha (haddii gelintadu dhammaato oo aynaan "gudbin", markaa gelitaanka ugu horreeya ee liiska ayaa si fudud u baaba'aya waxna laguma dari doono. ), iyo iyaga had iyo jeer kala soocida sida u koraya ee furaha codsiga (task_date, id).

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"

Talaabada 3: filter iyo "ballaari" diiwaanada

Qaar ka mid ah safafka xulashadayada soo noqnoqda, diiwaanada qaar rv waa la labanlaabay - marka hore waxaan helnaa sida "ka gudubka xadka gelitaanka 2aad ee liiska", ka dibna u beddelo sida 1aad ee liiska. Markaa dhacdada ugu horreysa waxay u baahan tahay in la sifeeyo.

Weydiinta kama dambaysta ah ee cabsida leh

WITH RECURSIVE T AS (
  -- #1 : заносим Π² список "ΠΏΠ΅Ρ€Π²Ρ‹Π΅" записи ΠΏΠΎ ΠΊΠ°ΠΆΠ΄ΠΎΠΌΡƒ ΠΈΠ· ΠΊΠ»ΡŽΡ‡Π΅ΠΉ Π½Π°Π±ΠΎΡ€Π°
  WITH wrap AS ( -- "ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΠΈΠ·ΡƒΠ΅ΠΌ" record'Ρ‹, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΎΠ±Ρ€Π°Ρ‰Π΅Π½ΠΈΠ΅ ΠΊ полям Π½Π΅ Π²Ρ‹Π·Ρ‹Π²Π°Π»ΠΎ умноТСния InitPlan/SubPlan
    WITH T AS (
      SELECT
        (
          SELECT
            r
          FROM
            task r
          WHERE
            owner_id = unnest
          ORDER BY
            task_date, id
          LIMIT 1
        ) r
      FROM
        unnest('{1,2,4,8,16,32,64,128,256,512}'::integer[])
    )
    SELECT
      array_agg(r ORDER BY (r).task_date, (r).id) list -- сортируСм список Π² Π½ΡƒΠΆΠ½ΠΎΠΌ порядкС
    FROM
      T
  )
  SELECT
    list
  , list[1] rv
  , FALSE not_cross
  , 0 size
  FROM
    wrap
UNION ALL
  -- #2 : Π²Ρ‹Ρ‡ΠΈΡ‚Ρ‹Π²Π°Π΅ΠΌ записи 1-Π³ΠΎ ΠΏΠΎ порядку ΠΊΠ»ΡŽΡ‡Π°, ΠΏΠΎΠΊΠ° Π½Π΅ ΠΏΠ΅Ρ€Π΅ΡˆΠ°Π³Π½Π΅ΠΌ Ρ‡Π΅Ρ€Π΅Π· запись 2-Π³ΠΎ
  SELECT
    CASE
      -- Ссли Π½ΠΈΡ‡Π΅Π³ΠΎ Π½Π΅ Π½Π°ΠΉΠ΄Π΅Π½ΠΎ для ΠΊΠ»ΡŽΡ‡Π° 1-ΠΉ записи
      WHEN X._r IS NOT DISTINCT FROM NULL THEN
        T.list[2:] -- ΡƒΠ±ΠΈΡ€Π°Π΅ΠΌ Π΅Π΅ ΠΈΠ· списка
      -- Ссли ΠΌΡ‹ НЕ пСрСсСкли ΠΏΡ€ΠΈΠΊΠ»Π°Π΄Π½ΠΎΠΉ ΠΊΠ»ΡŽΡ‡ 2-ΠΉ записи
      WHEN X.not_cross THEN
        T.list -- просто протягиваСм Ρ‚ΠΎΡ‚ ΠΆΠ΅ список Π±Π΅Π· ΠΌΠΎΠ΄ΠΈΡ„ΠΈΠΊΠ°Ρ†ΠΈΠΉ
      -- Ссли Π² спискС ΡƒΠΆΠ΅ Π½Π΅Ρ‚ 2-ΠΉ записи
      WHEN T.list[2] IS NULL THEN
        -- просто Π²ΠΎΠ·Π²Ρ€Π°Ρ‰Π°Π΅ΠΌ пустой список
        '{}'
      -- пСрСсортировываСм ΡΠ»ΠΎΠ²Π°Ρ€ΡŒ, убирая 1-ю запись ΠΈ добавляя послСднюю ΠΈΠ· Π½Π°ΠΉΠ΄Π΅Π½Π½Ρ‹Ρ…
      ELSE (
        SELECT
          coalesce(T.list[2] || array_agg(r ORDER BY (r).task_date, (r).id), '{}')
        FROM
          unnest(T.list[3:] || X._r) r
      )
    END
  , X._r
  , X.not_cross
  , T.size + X.not_cross::integer
  FROM
    T
  , LATERAL(
      WITH wrap AS ( -- "ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»ΠΈΠ·ΡƒΠ΅ΠΌ" record
        SELECT
          CASE
            -- Ссли всС-Ρ‚Π°ΠΊΠΈ "ΠΏΠ΅Ρ€Π΅ΡˆΠ°Π³Π½ΡƒΠ»ΠΈ" Ρ‡Π΅Ρ€Π΅Π· 2-ю запись
            WHEN NOT T.not_cross
              -- Ρ‚ΠΎ нуТная запись - пСрвая ΠΈΠ· спписка
              THEN T.list[1]
            ELSE ( -- Ссли Π½Π΅ пСрСсСкли, Ρ‚ΠΎ ΠΊΠ»ΡŽΡ‡ остался ΠΊΠ°ΠΊ Π² ΠΏΡ€Π΅Π΄Ρ‹Π΄ΡƒΡ‰Π΅ΠΉ записи - отталкиваСмся ΠΎΡ‚ Π½Π΅Π΅
              SELECT
                _r
              FROM
                task _r
              WHERE
                owner_id = (rv).owner_id AND
                (task_date, id) > ((rv).task_date, (rv).id)
              ORDER BY
                task_date, id
              LIMIT 1
            )
          END _r
      )
      SELECT
        _r
      , CASE
          -- Ссли 2-ΠΉ записи ΡƒΠΆΠ΅ Π½Π΅Ρ‚ Π² спискС, Π½ΠΎ ΠΌΡ‹ Ρ…ΠΎΡ‚ΡŒ Ρ‡Ρ‚ΠΎ-Ρ‚ΠΎ нашли
          WHEN list[2] IS NULL AND _r IS DISTINCT FROM NULL THEN
            TRUE
          ELSE -- Π½ΠΈΡ‡Π΅Π³ΠΎ Π½Π΅ нашли ΠΈΠ»ΠΈ "ΠΏΠ΅Ρ€Π΅ΡˆΠ°Π³Π½ΡƒΠ»ΠΈ"
            coalesce(((_r).task_date, (_r).id) < ((list[2]).task_date, (list[2]).id), FALSE)
        END not_cross
      FROM
        wrap
    ) X
  WHERE
    T.size < 20 AND -- ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡ΠΈΠ²Π°Π΅ΠΌ Ρ‚ΡƒΡ‚ количСство
    T.list IS DISTINCT FROM '{}' -- ΠΈΠ»ΠΈ ΠΏΠΎΠΊΠ° список Π½Π΅ кончился
)
-- #3 : "Ρ€Π°Π·Π²ΠΎΡ€Π°Ρ‡ΠΈΠ²Π°Π΅ΠΌ" записи - порядок Π³Π°Ρ€Π°Π½Ρ‚ΠΈΡ€ΠΎΠ²Π°Π½ ΠΏΠΎ ΠΏΠΎΡΡ‚Ρ€ΠΎΠ΅Π½ΠΈΡŽ
SELECT
  (rv).*
FROM
  T
WHERE
  not_cross; -- Π±Π΅Ρ€Π΅ΠΌ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ "Π½Π΅ΠΏΠ΅Ρ€Π΅ΡΠ΅ΠΊΠ°ΡŽΡ‰ΠΈΠ΅" записи

SQL HowTo: In si toos ah wax loogu qoro su'aasha, ama "Saddex-tilaabo ee Dugsiga Hoose"
[fiiri sharaxaad.tensor.ru]

Sidaas darteed, waxaan ka baayacmushtarray 50% xogta la akhriyo 20% wakhtiga fulinta. Taasi waa, haddii aad haysatid sababo aad ku aaminsan tahay in wax akhrisku ay qaadan karto wakhti dheer (tusaale ahaan, xogta inta badan kuma jirto kaydka, oo waa inaad u tagtaa diskka), markaa habkan waxaad ku tiirsanaan kartaa wax ka yar akhriska. .

Si kastaba ha ahaatee, wakhtiga fulinta ayaa u soo baxay inuu ka fiicnaado "naive" doorashada koowaad. Laakin keebaa 3-daan doorasho ee aad isticmaalayso adiga ku gaar ah.

Source: www.habr.com

Add a comment