PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi

A cikin hadaddun tsarin ERP ƙungiyoyi da yawa suna da yanayin matsayilokacin da abubuwa masu kama da juna suka yi layi a ciki itace dangantakar kakanni-zuriya - wannan shi ne tsarin tsari na kamfani (duk waɗannan rassa, sassan da ƙungiyoyin aiki), da kasida na kaya, da wuraren aiki, da labarin kasa na tallace-tallace, ...

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi

A gaskiya, babu wuraren sarrafa kansa na kasuwanci, inda ba za a sami wani matsayi a sakamakon haka. Amma ko da ba ku yi aiki don "kasuwanci ba," har yanzu kuna iya haɗuwa da alaƙar matsayi cikin sauƙi. Ba abin mamaki ba ne, ko da bishiyar danginku ko tsarin bene a cikin cibiyar siyayya iri ɗaya ne.

Akwai hanyoyi da yawa don adana irin wannan bishiyar a cikin DBMS, amma a yau za mu mai da hankali kan zaɓi ɗaya kawai:

CREATE TABLE hier(
  id
    integer
      PRIMARY KEY
, pid
    integer
      REFERENCES hier
, data
    json
);

CREATE INDEX ON hier(pid); -- не забываем, что FK не подразумевает автосоздание индекса, в отличие от PK

Kuma yayin da kuke duba zurfin matsayi, kuna jira don ganin yadda [a] tasiri hanyoyinku na "nauyi" na aiki tare da irin wannan tsarin.

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi
Bari mu dubi matsalolin da suka taso, aiwatar da su a cikin SQL, kuma muyi ƙoƙarin inganta aikin su.

#1. Yaya zurfin ramin zomo yake?

Bari mu, don tabbatacciyar, yarda cewa wannan tsarin zai nuna ma'auni na sassan a cikin tsarin kungiyar: sassan, sassa, sassa, rassa, ƙungiyoyin aiki ... - duk abin da kuka kira su.
PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi

Da farko, bari mu samar da 'bishiyar' mu na abubuwa 10K

INSERT INTO hier
WITH RECURSIVE T AS (
  SELECT
    1::integer id
  , '{1}'::integer[] pids
UNION ALL
  SELECT
    id + 1
  , pids[1:(random() * array_length(pids, 1))::integer] || (id + 1)
  FROM
    T
  WHERE
    id < 10000
)
SELECT
  pids[array_length(pids, 1)] id
, pids[array_length(pids, 1) - 1] pid
FROM
  T;

Bari mu fara da aiki mafi sauƙi - nemo duk ma'aikatan da ke aiki a cikin wani yanki na musamman, ko kuma dangane da matsayi - sami duk 'ya'yan kumburi. Har ila yau, zai yi kyau a sami "zurfin" zuriyar ... Duk wannan yana iya zama dole, alal misali, don gina wani nau'i. hadaddun zaɓi bisa jerin ID na waɗannan ma'aikata.

Komai zai yi kyau idan akwai matakan biyu na waɗannan zuriyar kuma adadin yana cikin dozin, amma idan akwai matakan sama da 5, kuma an riga an sami zuriya da yawa, ana iya samun matsala. Bari mu dubi yadda ake rubuta zaɓuɓɓukan bincike na gargajiya na ƙasa (da aiki). Amma da farko, bari mu tantance waɗanne nodes ne za su fi sha'awar bincikenmu.

Yawanci "zurfi" ƙananan bishiyoyi:

WITH RECURSIVE T AS (
  SELECT
    id
  , pid
  , ARRAY[id] path
  FROM
    hier
  WHERE
    pid IS NULL
UNION ALL
  SELECT
    hier.id
  , hier.pid
  , T.path || hier.id
  FROM
    T
  JOIN
    hier
      ON hier.pid = T.id
)
TABLE T ORDER BY array_length(path, 1) DESC;

 id  | pid  | path
---------------------------------------------
7624 | 7623 | {7615,7620,7621,7622,7623,7624}
4995 | 4994 | {4983,4985,4988,4993,4994,4995}
4991 | 4990 | {4983,4985,4988,4989,4990,4991}
...

Yawanci "fadi" ƙananan bishiyoyi:

...
SELECT
  path[1] id
, count(*)
FROM
  T
GROUP BY
  1
ORDER BY
  2 DESC;

id   | count
------------
5300 |   30
 450 |   28
1239 |   27
1573 |   25

Don waɗannan tambayoyin mun yi amfani da na yau da kullun recursive JOIN:
PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi

Babu shakka, tare da wannan samfurin buƙatar Yawan maimaitawa zai zama daidai da jimillar zuriyar (kuma akwai dozin da yawa daga cikinsu), kuma wannan na iya ɗaukar manyan albarkatu, kuma, a sakamakon haka, lokaci.

Bari mu duba kan "mafi fadi" subtree:

WITH RECURSIVE T AS (
  SELECT
    id
  FROM
    hier
  WHERE
    id = 5300
UNION ALL
  SELECT
    hier.id
  FROM
    T
  JOIN
    hier
      ON hier.pid = T.id
)
TABLE T;

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi
[duba bayanin.tensor.ru]

Kamar yadda aka zata, mun sami duk bayanan 30. Amma sun kashe kashi 60% na jimlar lokacin akan wannan - saboda sun kuma yi bincike 30 a cikin index. Shin zai yiwu a yi ƙasa da ƙasa?

Babban gyare-gyare ta hanyar fihirisa

Shin muna buƙatar yin tambaya daban don kowane kumburi? Sai dai itace cewa a'a - za mu iya karanta daga index amfani da maɓallai da yawa lokaci guda a kira ɗaya tare da taimakon = ANY(array).

Kuma a cikin kowane rukuni na masu ganowa za mu iya ɗaukar duk ID ɗin da aka samu a mataki na baya ta hanyar "nodes". Wato a kowane mataki na gaba za mu yi nemo duk zuriyar wani mataki lokaci guda.

Kawai, ga matsalar, a cikin zaɓi na maimaitawa, ba za ku iya samun dama ga kanta a cikin tambayar da aka kafa ba, amma muna buƙatar ko ta yaya zaɓi kawai abin da aka samo a matakin da ya gabata ... Ya bayyana cewa ba shi yiwuwa a yi tambaya na gida don dukan zaɓin, amma don takamaiman filin yana yiwuwa. Kuma wannan filin yana iya zama tsararru - wanda shine abin da muke buƙatar amfani da shi ANY.

Yana jin ɗan hauka, amma a cikin zanen komai yana da sauƙi.

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi

WITH RECURSIVE T AS (
  SELECT
    ARRAY[id] id$
  FROM
    hier
  WHERE
    id = 5300
UNION ALL
  SELECT
    ARRAY(
      SELECT
        id
      FROM
        hier
      WHERE
        pid = ANY(T.id$)
    ) id$
  FROM
    T
  WHERE
    coalesce(id$, '{}') <> '{}' -- условие выхода из цикла - пустой массив
)
SELECT
  unnest(id$) id
FROM
  T;

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi
[duba bayanin.tensor.ru]

Kuma a nan abu mafi mahimmanci ba ma ba ne nasara sau 1.5 cikin lokaci, da kuma cewa mun rage ƴan buffers, tunda muna da kira 5 kawai zuwa index maimakon 30!

Ƙarin kari shine gaskiyar cewa bayan rashin zaman lafiya na ƙarshe, masu ganowa za su kasance da oda ta "matakai".

Alamar kumburi

La'akari na gaba wanda zai taimaka inganta aikin shine - "ganye" ba zai iya haihuwa ba, wato, a gare su babu buƙatar kallon "ƙasa" kwata-kwata. A cikin tsara aikinmu, wannan yana nufin cewa idan muka bi jerin sassan kuma muka isa ga ma'aikaci, to babu buƙatar sake dubawa tare da wannan reshe.

Mu shiga teburin mu ƙari boolean- filin, wanda nan da nan zai gaya mana ko wannan shigarwa ta musamman a cikin bishiyarmu "kumburi" ne - wato, ko zai iya samun zuriya gaba ɗaya.

ALTER TABLE hier
  ADD COLUMN branch boolean;

UPDATE
  hier T
SET
  branch = TRUE
WHERE
  EXISTS(
    SELECT
      NULL
    FROM
      hier
    WHERE
      pid = T.id
    LIMIT 1
);
-- Запрос успешно выполнен: 3033 строк изменено за 42 мс.

Mai girma! Ya bayyana cewa kawai kadan fiye da 30% na duk abubuwan bishiyar suna da zuriya.

Yanzu bari mu yi amfani da wani ɗan daban-daban makaniki - haɗi zuwa recursive bangaren ta hanyar LATERAL, wanda zai ba mu damar samun damar kai tsaye zuwa filayen "tebur" mai maimaitawa, kuma muyi amfani da aikin tarawa tare da yanayin tacewa dangane da kumburi don rage saitin maɓalli:

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi

WITH RECURSIVE T AS (
  SELECT
    array_agg(id) id$
  , array_agg(id) FILTER(WHERE branch) ns$
  FROM
    hier
  WHERE
    id = 5300
UNION ALL
  SELECT
    X.*
  FROM
    T
  JOIN LATERAL (
    SELECT
      array_agg(id) id$
    , array_agg(id) FILTER(WHERE branch) ns$
    FROM
      hier
    WHERE
      pid = ANY(T.ns$)
  ) X
    ON coalesce(T.ns$, '{}') <> '{}'
)
SELECT
  unnest(id$) id
FROM
  T;

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi
[duba bayanin.tensor.ru]

Mun sami damar rage ƙarin kira guda ɗaya kuma lashe fiye da sau 2 a cikin girma karantawa.

#2. Mu koma ga tushen

Wannan algorithm zai zama da amfani idan kuna buƙatar tattara bayanai don duk abubuwan "a saman bishiyar", yayin da kuke riƙe bayanai game da wane takarda tushe (kuma tare da waɗanne alamomi) ya sa aka haɗa shi a cikin samfurin - alal misali, don samar da rahoton taƙaitaccen bayani. tare da tarawa cikin nodes.

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi
Abin da ya biyo baya yakamata a dauki shi azaman hujja kawai, tunda buƙatar ta zama mai wahala. Amma idan ya mamaye bayananku, yakamata kuyi tunanin amfani da dabaru iri ɗaya.

Bari mu fara da wasu kalmomi masu sauƙi:

  • Rikodin iri ɗaya daga bayanan bayanai Zai fi kyau a karanta shi sau ɗaya kawai.
  • Rubuce-rubuce daga database Ya fi dacewa don karantawa cikin batchesfiye da kadai.

Yanzu bari mu yi ƙoƙarin gina buƙatar da muke buƙata.

Mataki 1

Babu shakka, lokacin fara sake dawowa (inda za mu kasance ba tare da shi ba!) Dole ne mu cire bayanan ganyen da kansu bisa tsarin abubuwan ganowa na farko:

WITH RECURSIVE tree AS (
  SELECT
    rec -- это цельная запись таблицы
  , id::text chld -- это "набор" приведших сюда исходных листьев
  FROM
    hier rec
  WHERE
    id = ANY('{1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192}'::integer[])
UNION ALL
  ...

Idan ya zama kamar baƙon abu ga wani cewa an adana "saitin" azaman kirtani kuma ba tsararru ba, to akwai bayani mai sauƙi akan wannan. Akwai ginanniyar aikin “gluing” mai haɗawa don kirtani string_agg, amma ba don tsararru ba. Ko da yake ita mai sauƙin aiwatarwa da kanka.

Mataki 2

Yanzu za mu sami saitin ID na sashe waɗanda za a buƙaci ƙarin karantawa. Kusan koyaushe za a kwafi su a cikin bayanan asali daban-daban - don haka za mu yi rukuni su, yayin adana bayanai game da tushen ganye.

Amma a nan matsaloli guda uku suna jiran mu:

  1. Bangaren "subrecursive" na tambayar ba zai iya ƙunsar tara ayyuka tare da GROUP BY.
  2. Tunanin “tebur” mai maimaitawa ba zai iya kasancewa a cikin gidan da aka gina ba.
  3. Buƙatun a ɓangaren maimaitawa ba zai iya ƙunsar CTE ba.

Abin farin ciki, duk waɗannan matsalolin suna da sauƙin aiki a kusa. Bari mu fara daga ƙarshe.

CTE a cikin recursive part

Kamar wannan ba yana aiki:

WITH RECURSIVE tree AS (
  ...
UNION ALL
  WITH T (...)
  SELECT ...
)

Kuma don haka yana aiki, ƙididdiga suna yin bambanci!

WITH RECURSIVE tree AS (
  ...
UNION ALL
  (
    WITH T (...)
    SELECT ...
  )
)

Nested tambaya akan "tebur" mai maimaitawa

Hmm... Ba za a iya isa ga CTE mai maimaitawa ba a cikin abin da ke biyo baya. Amma yana iya zama a cikin CTE! Kuma buƙatun ƙira na iya samun dama ga wannan CTE!

GROUP BY ciki maimaituwa

Ba shi da daɗi, amma ... Muna da hanya mai sauƙi don yin koyi da GROUP TA hanyar amfani DISTINCT ON da ayyukan taga!

SELECT
  (rec).pid id
, string_agg(chld::text, ',') chld
FROM
  tree
WHERE
  (rec).pid IS NOT NULL
GROUP BY 1 -- не работает!

Kuma wannan shine yadda yake aiki!

SELECT DISTINCT ON((rec).pid)
  (rec).pid id
, string_agg(chld::text, ',') OVER(PARTITION BY (rec).pid) chld
FROM
  tree
WHERE
  (rec).pid IS NOT NULL

Yanzu mun ga dalilin da ya sa aka mayar da lambar ID ɗin rubutu - don a haɗa su tare da waƙafi!

Mataki 3

Ga wasan karshe ba mu da abin da ya rage:

  • mun karanta bayanan "sashe" bisa tsarin ID na rukuni
  • muna kwatanta sassan da aka cire tare da "saitin" na ainihin zanen gado
  • “fadada” saitin kirtani ta amfani da unnest(string_to_array(chld, ',')::integer[])

WITH RECURSIVE tree AS (
  SELECT
    rec
  , id::text chld
  FROM
    hier rec
  WHERE
    id = ANY('{1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192}'::integer[])
UNION ALL
  (
    WITH prnt AS (
      SELECT DISTINCT ON((rec).pid)
        (rec).pid id
      , string_agg(chld::text, ',') OVER(PARTITION BY (rec).pid) chld
      FROM
        tree
      WHERE
        (rec).pid IS NOT NULL
    )
    , nodes AS (
      SELECT
        rec
      FROM
        hier rec
      WHERE
        id = ANY(ARRAY(
          SELECT
            id
          FROM
            prnt
        ))
    )
    SELECT
      nodes.rec
    , prnt.chld
    FROM
      prnt
    JOIN
      nodes
        ON (nodes.rec).id = prnt.id
  )
)
SELECT
  unnest(string_to_array(chld, ',')::integer[]) leaf
, (rec).*
FROM
  tree;

PostgreSQL Antipatterns: Yaya zurfin rami na zomo? mu bi ta cikin matsayi
[duba bayanin.tensor.ru]

source: www.habr.com

Add a comment