PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy

Kwiinkqubo ezintsonkothileyo zeERP amaqumrhu amaninzi anendalo yoluhluxa izinto ezilinganayo zidibana umthi wobudlelwane bokhokho nenzala - esi sisakhiwo sombutho weshishini (onke la masebe, amasebe kunye namaqela omsebenzi), kunye nekhathalogu yempahla, kunye neendawo zokusebenza, kunye nejografi yeendawo zokuthengisa, ...

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy

Enyanisweni, akukho nanye iindawo ezizisebenzelayo zoshishino, apho bekungayi kubakho naluphi na uhlobo lweqela ngenxa yoko. Kodwa nokuba awusebenzeli "ishishini," usenokudibana ngokulula nobudlelwane be-hierarchical. I-trite, nokuba umthi wosapho lwakho okanye isicwangciso somgangatho wezakhiwo kwindawo yokuthenga sisakhiwo esifanayo.

Zininzi iindlela zokugcina umthi onjalo kwi-DBMS, kodwa namhlanje siza kugxila kwinketho enye kuphela:

CREATE TABLE hier(
  id
    integer
      PRIMARY KEY
, pid
    integer
      REFERENCES hier
, data
    json
);

CREATE INDEX ON hier(pid); -- Π½Π΅ Π·Π°Π±Ρ‹Π²Π°Π΅ΠΌ, Ρ‡Ρ‚ΠΎ FK Π½Π΅ ΠΏΠΎΠ΄Ρ€Π°Π·ΡƒΠΌΠ΅Π²Π°Π΅Ρ‚ автосозданиС индСкса, Π² ΠΎΡ‚Π»ΠΈΡ‡ΠΈΠ΅ ΠΎΡ‚ PK

Kwaye ngelixa ujonge kubunzulu bolawulo, ilinde ngomonde ukubona ukuba zisebenza njani iindlela zakho "ezingenangqondo" zokusebenza ngesakhiwo esinjalo.

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy
Makhe sijonge iingxaki eziqhelekileyo ezivelayo, ukuphunyezwa kwazo kwi-SQL, kwaye sizame ukuphucula ukusebenza kwazo.

#1. Unzulu kangakanani umngxuma womvundla?

Masivume, ngokuqinisekileyo, ukuba esi sakhiwo siya kubonisa ukuthotyelwa kwamasebe kwisakhiwo sombutho: amasebe, amacandelo, amacandelo, amasebe, amaqela asebenzayo ... - nokuba ubiza ntoni na.
PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy

Okokuqala, masivelise 'umthi' wethu wezinto eziyi-10K

INSERT INTO hier
WITH RECURSIVE T AS (
  SELECT
    1::integer id
  , '{1}'::integer[] pids
UNION ALL
  SELECT
    id + 1
  , pids[1:(random() * array_length(pids, 1))::integer] || (id + 1)
  FROM
    T
  WHERE
    id < 10000
)
SELECT
  pids[array_length(pids, 1)] id
, pids[array_length(pids, 1) - 1] pid
FROM
  T;

Masiqale ngowona msebenzi ulula-ukufumana bonke abasebenzi abasebenza kwicandelo elithile, okanye ngokwemigaqo yolawulo-melo - fumana bonke abantwana be-node. Kuya kuba kuhle ukufumana "ubunzulu" benzala ... Konke oku kunokuba yimfuneko, umzekelo, ukwakha uhlobo oluthile ukhetho oluntsonkothileyo olusekwe kuluhlu lwezazisi zaba basebenzi.

Yonke into ingalunga ukuba kukho isibini samanqanaba ale nzala kwaye inani lingaphakathi kweshumi elinesibini, kodwa ukuba kukho amanqanaba angaphezu kwama-5, kwaye sele kukho inzala eninzi, kunokubakho iingxaki. Makhe sijonge indlela iinketho zokukhangela zemveli phantsi-umthi zibhalwa (kunye nomsebenzi). Kodwa kuqala, makhe sijonge ukuba zeziphi iindawo eziya kuba yeyona nto inomdla kuphando lwethu.

Eyona "nzulu" subtrees:

WITH RECURSIVE T AS (
  SELECT
    id
  , pid
  , ARRAY[id] path
  FROM
    hier
  WHERE
    pid IS NULL
UNION ALL
  SELECT
    hier.id
  , hier.pid
  , T.path || hier.id
  FROM
    T
  JOIN
    hier
      ON hier.pid = T.id
)
TABLE T ORDER BY array_length(path, 1) DESC;

 id  | pid  | path
---------------------------------------------
7624 | 7623 | {7615,7620,7621,7622,7623,7624}
4995 | 4994 | {4983,4985,4988,4993,4994,4995}
4991 | 4990 | {4983,4985,4988,4989,4990,4991}
...

Eyona "banzi" subtrees:

...
SELECT
  path[1] id
, count(*)
FROM
  T
GROUP BY
  1
ORDER BY
  2 DESC;

id   | count
------------
5300 |   30
 450 |   28
1239 |   27
1573 |   25

Kule mibuzo sisebenzise eqhelekileyo recursive JOIN:
PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy

Ngokucacileyo, ngale modeli yesicelo inani lokuphindaphinda liya kuba lilonke inani lenzala (kwaye kukho ezininzi zazo), kwaye oku kungathatha izixhobo ezibalulekileyo, kwaye, ngenxa yoko, ixesha.

Makhe sijonge owona mthi "obanzi":

WITH RECURSIVE T AS (
  SELECT
    id
  FROM
    hier
  WHERE
    id = 5300
UNION ALL
  SELECT
    hier.id
  FROM
    T
  JOIN
    hier
      ON hier.pid = T.id
)
TABLE T;

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy
[Jonga apha explain.tensor.ru]

Njengoko bekulindelekile, sifumene zonke iirekhodi ze-30. Kodwa bachithe i-60% yexesha elipheleleyo kule nto - kuba baye benza uphando lwama-30 kwisalathiso. Ngaba kunokwenzeka ukwenza okuncinci?

Uvavanyo oluninzi ngesalathisi

Ngaba kufuneka senze umbuzo wesalathiso owahlukileyo kwindawo nganye? Kuvela ukuba akukho - sinokufunda kwisalathisi usebenzisa amaqhosha amaninzi ngexesha elinye kwifowuni enye ngoncedo = ANY(array).

Kwaye kwiqela ngalinye lezo zichongi sinokuthatha zonke ii-ID ezifunyenwe kwisinyathelo sangaphambili ngokuthi "nodes". Oko kukuthi, kwinqanaba ngalinye elilandelayo siya kwenza khangela yonke inzala yenqanaba elithile kanye.

Kuphela, nantsi ingxaki, kukhetho oluphinda-phindayo, awukwazi ukufikelela ngokwalo kumbuzo osedlelweni, kodwa kufuneka ngandlela-thile sikhethe kuphela oko kufunyenwe kwinqanaba langaphambili ... Kuvela ukuba akunakwenzeka ukwenza umbuzo owenziwe kwindlwana yokhetho lonke, kodwa kwintsimi yayo ethile inokwenzeka. Kwaye lo mmandla unokuba luluhlu - eyona nto kufuneka siyisebenzise ANY.

Kuvakala ukuba uphambene, kodwa kumzobo yonke into ilula.

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy

WITH RECURSIVE T AS (
  SELECT
    ARRAY[id] id$
  FROM
    hier
  WHERE
    id = 5300
UNION ALL
  SELECT
    ARRAY(
      SELECT
        id
      FROM
        hier
      WHERE
        pid = ANY(T.id$)
    ) id$
  FROM
    T
  WHERE
    coalesce(id$, '{}') <> '{}' -- условиС Π²Ρ‹Ρ…ΠΎΠ΄Π° ΠΈΠ· Ρ†ΠΈΠΊΠ»Π° - пустой массив
)
SELECT
  unnest(id$) id
FROM
  T;

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy
[Jonga apha explain.tensor.ru]

Kwaye apha eyona nto ibalulekileyo ayikho win 1.5 amaxesha ngexesha, kwaye sithathele izithinteli ezimbalwa, kuba sineminxeba emi-5 kuphela kwisalathiso endaweni yama-30!

Ibhonasi eyongezelelweyo yinto yokuba emva kwe-unnest yokugqibela, i-identifiers iya kuhlala iyalelwa "ngamanqanaba".

Uphawu lwendawo

Ingqwalasela elandelayo eya kunceda ekuphuculeni intsebenzo yile - "amagqabi" akanakuba nabantwana, oko kukuthi, kubo akukho mfuneko yokujonga "phantsi" nonke. Ekuqulunqweni komsebenzi wethu, oku kuthetha ukuba ukuba silandele ikhonkco lamasebe kwaye sifikelele kumqeshwa, akukho mfuneko yokujonga ngakumbi kweli sebe.

Masingene etafileni yethu ezongezelelweyo boolean-ibala, eya kuthi ngokukhawuleza isixelele ukuba le nto yokungena emthini wethu "i-node" - oko kukuthi, ingaba nayo yonke inzala.

ALTER TABLE hier
  ADD COLUMN branch boolean;

UPDATE
  hier T
SET
  branch = TRUE
WHERE
  EXISTS(
    SELECT
      NULL
    FROM
      hier
    WHERE
      pid = T.id
    LIMIT 1
);
-- Запрос ΡƒΡΠΏΠ΅ΡˆΠ½ΠΎ Π²Ρ‹ΠΏΠΎΠ»Π½Π΅Π½: 3033 строк ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΎ Π·Π° 42 мс.

Kakhulu! Kuye kwafumaniseka ukuba ngaphaya kwe-30% yazo zonke izinto zemithi ezinenzala.

Ngoku masisebenzise i-mechanic eyahlukileyo kancinane - uqhagamshelo kwindawo ephindaphindwayo LATERAL, eya kusivumela ukuba sifikelele ngokukhawuleza kwimihlaba "yetafile" ephindaphindiweyo, kwaye sisebenzise umsebenzi odibeneyo kunye nemeko yokucoca esekelwe kwi-node yokunciphisa isethi yezitshixo:

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy

WITH RECURSIVE T AS (
  SELECT
    array_agg(id) id$
  , array_agg(id) FILTER(WHERE branch) ns$
  FROM
    hier
  WHERE
    id = 5300
UNION ALL
  SELECT
    X.*
  FROM
    T
  JOIN LATERAL (
    SELECT
      array_agg(id) id$
    , array_agg(id) FILTER(WHERE branch) ns$
    FROM
      hier
    WHERE
      pid = ANY(T.ns$)
  ) X
    ON coalesce(T.ns$, '{}') <> '{}'
)
SELECT
  unnest(id$) id
FROM
  T;

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy
[Jonga apha explain.tensor.ru]

Siye sakwazi ukunciphisa umnxeba omnye wesalathiso kwaye iphumelele ngaphezu kwamaxesha ama-2 kumthamo ukuvavanya.

#2. Masibuyele kwiingcambu

Le algorithm iya kuba luncedo ukuba ufuna ukuqokelela iirekhodi kuzo zonke izinto "phezulu komthi", ngelixa ugcina ulwazi malunga nephepha lomthombo (kwaye kunye neziphi izalathisi) ezibangele ukuba zifakwe kwisampuli - umzekelo, ukuvelisa ingxelo yesishwankathelo. ngokudityaniswa kwiinodi.

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy
Oku kulandelayo kufuneka kuthathwe kuphela njengobungqina bengqikelelo, kuba isicelo sijika sinzima kakhulu. Kodwa ukuba ilawula isiseko sakho sedatha, kuya kufuneka ucinge ngokusebenzisa iindlela ezifanayo.

Masiqale ngeenkcazo ezimbalwa ezilula:

  • Ingxelo efanayo evela kwisiseko sedatha Kungcono ukuyifunda kube kanye.
  • Iirekhodi ezivela kuvimba weenkcukacha Kusebenza ngakumbi ukufunda kwiibhetshikunokuba yedwa.

Ngoku makhe sizame ukwakha isicelo esisifunayo.

Isinyathelo 1

Ngokucacileyo, xa kuqalwa ukuphindaphinda (besiya kuba phi ngaphandle kwayo!) kuya kufuneka sithathe iirekhodi zamagqabi ngokwazo ngokusekelwe kwiseti yeempawu zokuqala:

WITH RECURSIVE tree AS (
  SELECT
    rec -- это Ρ†Π΅Π»ΡŒΠ½Π°Ρ запись Ρ‚Π°Π±Π»ΠΈΡ†Ρ‹
  , id::text chld -- это "Π½Π°Π±ΠΎΡ€" ΠΏΡ€ΠΈΠ²Π΅Π΄ΡˆΠΈΡ… сюда исходных Π»ΠΈΡΡ‚ΡŒΠ΅Π²
  FROM
    hier rec
  WHERE
    id = ANY('{1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192}'::integer[])
UNION ALL
  ...

Ukuba kwakubonakala kungaqhelekanga kumntu ukuba "isethi" igcinwe njengentambo kwaye kungekhona uluhlu, ngoko kukho inkcazo elula yale nto. Kukho i-aggregating in aggregating "gluing" umsebenzi weentambo string_agg, kodwa kungekhona ngenxa yoluhlu. Nangona yena kulula ukuyiphumeza ngokwakho.

Isinyathelo 2

Ngoku siza kufumana iseti yee-ID zecandelo ekuya kufuneka zifundwe ngakumbi. Phantse ngalo lonke ixesha ziya kuphinda-phindwa kwiirekhodi ezahlukeneyo zeseti yokuqala - ngoko besiya kwenjenjalo amaqela, ngelixa ugcina ulwazi malunga nomthombo wamagqabi.

Kodwa nazi iingxaki ezintathu ezisilindeleyo:

  1. Inxalenye yombuzo ethi "subrecursive" ayinakuqulatha imisebenzi edibeneyo GROUP BY.
  2. Isalathiso se "itafile" ephindaphindwayo ayinakuba kwindlwane engaphantsi.
  3. Isicelo kwi-recursive part ayinakuqulatha i-CTE.

Ngethamsanqa, zonke ezi ngxaki kulula ukuzilungisa. Masiqale ukusuka ekugqibeleni.

I-CTE kwinxalenye yokuphindaphinda

Nantsi njalo hayi iyasebenza:

WITH RECURSIVE tree AS (
  ...
UNION ALL
  WITH T (...)
  SELECT ...
)

Kwaye ke iyasebenza, izibiyeli ziyawenza umahluko!

WITH RECURSIVE tree AS (
  ...
UNION ALL
  (
    WITH T (...)
    SELECT ...
  )
)

Umbuzo obekwe ecaleni kwe "tafile" ephindaphindwayo

Hmm... I-CTE ephinda-phindayo ayinakufikelelwa kwi-subquery. Kodwa inokuba ngaphakathi kwe-CTE! Kwaye isicelo esifakwe kwindlwane sinokufikelela kule CTE!

IQELA NGAPHANDLE kophindaphindo

Ayimnandanga, kodwa... Sinendlela elula yokulinganisa IQELA NGOKUsebenzisa DISTINCT ON kunye nemisebenzi yefestile!

SELECT
  (rec).pid id
, string_agg(chld::text, ',') chld
FROM
  tree
WHERE
  (rec).pid IS NOT NULL
GROUP BY 1 -- Π½Π΅ Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚!

Kwaye le yindlela esebenza ngayo!

SELECT DISTINCT ON((rec).pid)
  (rec).pid id
, string_agg(chld::text, ',') OVER(PARTITION BY (rec).pid) chld
FROM
  tree
WHERE
  (rec).pid IS NOT NULL

Ngoku siyabona ukuba kutheni isazisi samanani sajikwa sayinto ebhaliweyo - ukuze zidityaniswe kunye zohlulwe ziikoma!

Isinyathelo 3

Okokugqibela akukho nto iseleyo:

  • sifunda iirekhodi "zecandelo" ngokusekelwe kwiseti yee-ID zamaqela
  • sithelekisa amacandelo athatyathiweyo kunye "neesethi" zamaphepha okuqala
  • "Yandisa" umtya wokusetha usebenzisa unnest(string_to_array(chld, ',')::integer[])

WITH RECURSIVE tree AS (
  SELECT
    rec
  , id::text chld
  FROM
    hier rec
  WHERE
    id = ANY('{1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192}'::integer[])
UNION ALL
  (
    WITH prnt AS (
      SELECT DISTINCT ON((rec).pid)
        (rec).pid id
      , string_agg(chld::text, ',') OVER(PARTITION BY (rec).pid) chld
      FROM
        tree
      WHERE
        (rec).pid IS NOT NULL
    )
    , nodes AS (
      SELECT
        rec
      FROM
        hier rec
      WHERE
        id = ANY(ARRAY(
          SELECT
            id
          FROM
            prnt
        ))
    )
    SELECT
      nodes.rec
    , prnt.chld
    FROM
      prnt
    JOIN
      nodes
        ON (nodes.rec).id = prnt.id
  )
)
SELECT
  unnest(string_to_array(chld, ',')::integer[]) leaf
, (rec).*
FROM
  tree;

PostgreSQL Antipatterns: Ubunzulu kangakanani umngxuma umvundla? masihambe nge hierarchy
[Jonga apha explain.tensor.ru]

umthombo: www.habr.com

Yongeza izimvo