A cikin hadaddun tsarin ERP ƙungiyoyi da yawa suna da yanayin matsayilokacin da abubuwa masu kama da juna suka yi layi a ciki itace dangantakar kakanni-zuriya - wannan shi ne tsarin tsari na kamfani (duk waɗannan rassa, sassan da ƙungiyoyin aiki), da kasida na kaya, da wuraren aiki, da labarin kasa na tallace-tallace, ...
A gaskiya, babu
Akwai hanyoyi da yawa don adana irin wannan bishiyar a cikin DBMS, amma a yau za mu mai da hankali kan zaɓi ɗaya kawai:
CREATE TABLE hier(
id
integer
PRIMARY KEY
, pid
integer
REFERENCES hier
, data
json
);
CREATE INDEX ON hier(pid); -- не забываем, что FK не подразумевает автосоздание индекса, в отличие от PK
Kuma yayin da kuke duba zurfin matsayi, kuna jira don ganin yadda [a] tasiri hanyoyinku na "nauyi" na aiki tare da irin wannan tsarin.
Bari mu dubi matsalolin da suka taso, aiwatar da su a cikin SQL, kuma muyi ƙoƙarin inganta aikin su.
#1. Yaya zurfin ramin zomo yake?
Bari mu, don tabbatacciyar, yarda cewa wannan tsarin zai nuna ma'auni na sassan a cikin tsarin kungiyar: sassan, sassa, sassa, rassa, ƙungiyoyin aiki ... - duk abin da kuka kira su.
Da farko, bari mu samar da 'bishiyar' mu na abubuwa 10K
INSERT INTO hier
WITH RECURSIVE T AS (
SELECT
1::integer id
, '{1}'::integer[] pids
UNION ALL
SELECT
id + 1
, pids[1:(random() * array_length(pids, 1))::integer] || (id + 1)
FROM
T
WHERE
id < 10000
)
SELECT
pids[array_length(pids, 1)] id
, pids[array_length(pids, 1) - 1] pid
FROM
T;
Bari mu fara da aiki mafi sauƙi - nemo duk ma'aikatan da ke aiki a cikin wani yanki na musamman, ko kuma dangane da matsayi - sami duk 'ya'yan kumburi. Har ila yau, zai yi kyau a sami "zurfin" zuriyar ... Duk wannan yana iya zama dole, alal misali, don gina wani nau'i.
Komai zai yi kyau idan akwai matakan biyu na waɗannan zuriyar kuma adadin yana cikin dozin, amma idan akwai matakan sama da 5, kuma an riga an sami zuriya da yawa, ana iya samun matsala. Bari mu dubi yadda ake rubuta zaɓuɓɓukan bincike na gargajiya na ƙasa (da aiki). Amma da farko, bari mu tantance waɗanne nodes ne za su fi sha'awar bincikenmu.
Yawanci "zurfi" ƙananan bishiyoyi:
WITH RECURSIVE T AS (
SELECT
id
, pid
, ARRAY[id] path
FROM
hier
WHERE
pid IS NULL
UNION ALL
SELECT
hier.id
, hier.pid
, T.path || hier.id
FROM
T
JOIN
hier
ON hier.pid = T.id
)
TABLE T ORDER BY array_length(path, 1) DESC;
id | pid | path
---------------------------------------------
7624 | 7623 | {7615,7620,7621,7622,7623,7624}
4995 | 4994 | {4983,4985,4988,4993,4994,4995}
4991 | 4990 | {4983,4985,4988,4989,4990,4991}
...
Yawanci "fadi" ƙananan bishiyoyi:
...
SELECT
path[1] id
, count(*)
FROM
T
GROUP BY
1
ORDER BY
2 DESC;
id | count
------------
5300 | 30
450 | 28
1239 | 27
1573 | 25
Don waɗannan tambayoyin mun yi amfani da na yau da kullun recursive JOIN:
Babu shakka, tare da wannan samfurin buƙatar Yawan maimaitawa zai zama daidai da jimillar zuriyar (kuma akwai dozin da yawa daga cikinsu), kuma wannan na iya ɗaukar manyan albarkatu, kuma, a sakamakon haka, lokaci.
Bari mu duba kan "mafi fadi" subtree:
WITH RECURSIVE T AS (
SELECT
id
FROM
hier
WHERE
id = 5300
UNION ALL
SELECT
hier.id
FROM
T
JOIN
hier
ON hier.pid = T.id
)
TABLE T;
Kamar yadda aka zata, mun sami duk bayanan 30. Amma sun kashe kashi 60% na jimlar lokacin akan wannan - saboda sun kuma yi bincike 30 a cikin index. Shin zai yiwu a yi ƙasa da ƙasa?
Babban gyare-gyare ta hanyar fihirisa
Shin muna buƙatar yin tambaya daban don kowane kumburi? Sai dai itace cewa a'a - za mu iya karanta daga index amfani da maɓallai da yawa lokaci guda a kira ɗaya tare da taimakon = ANY(array)
.
Kuma a cikin kowane rukuni na masu ganowa za mu iya ɗaukar duk ID ɗin da aka samu a mataki na baya ta hanyar "nodes". Wato a kowane mataki na gaba za mu yi nemo duk zuriyar wani mataki lokaci guda.
Kawai, ga matsalar, a cikin zaɓi na maimaitawa, ba za ku iya samun dama ga kanta a cikin tambayar da aka kafa ba, amma muna buƙatar ko ta yaya zaɓi kawai abin da aka samo a matakin da ya gabata ... Ya bayyana cewa ba shi yiwuwa a yi tambaya na gida don dukan zaɓin, amma don takamaiman filin yana yiwuwa. Kuma wannan filin yana iya zama tsararru - wanda shine abin da muke buƙatar amfani da shi ANY
.
Yana jin ɗan hauka, amma a cikin zanen komai yana da sauƙi.
WITH RECURSIVE T AS (
SELECT
ARRAY[id] id$
FROM
hier
WHERE
id = 5300
UNION ALL
SELECT
ARRAY(
SELECT
id
FROM
hier
WHERE
pid = ANY(T.id$)
) id$
FROM
T
WHERE
coalesce(id$, '{}') <> '{}' -- условие выхода из цикла - пустой массив
)
SELECT
unnest(id$) id
FROM
T;
Kuma a nan abu mafi mahimmanci ba ma ba ne nasara sau 1.5 cikin lokaci, da kuma cewa mun rage ƴan buffers, tunda muna da kira 5 kawai zuwa index maimakon 30!
Ƙarin kari shine gaskiyar cewa bayan rashin zaman lafiya na ƙarshe, masu ganowa za su kasance da oda ta "matakai".
Alamar kumburi
La'akari na gaba wanda zai taimaka inganta aikin shine - "ganye" ba zai iya haihuwa ba, wato, a gare su babu buƙatar kallon "ƙasa" kwata-kwata. A cikin tsara aikinmu, wannan yana nufin cewa idan muka bi jerin sassan kuma muka isa ga ma'aikaci, to babu buƙatar sake dubawa tare da wannan reshe.
Mu shiga teburin mu ƙari boolean
- filin, wanda nan da nan zai gaya mana ko wannan shigarwa ta musamman a cikin bishiyarmu "kumburi" ne - wato, ko zai iya samun zuriya gaba ɗaya.
ALTER TABLE hier
ADD COLUMN branch boolean;
UPDATE
hier T
SET
branch = TRUE
WHERE
EXISTS(
SELECT
NULL
FROM
hier
WHERE
pid = T.id
LIMIT 1
);
-- Запрос успешно выполнен: 3033 строк изменено за 42 мс.
Mai girma! Ya bayyana cewa kawai kadan fiye da 30% na duk abubuwan bishiyar suna da zuriya.
Yanzu bari mu yi amfani da wani ɗan daban-daban makaniki - haɗi zuwa recursive bangaren ta hanyar LATERAL
, wanda zai ba mu damar samun damar kai tsaye zuwa filayen "tebur" mai maimaitawa, kuma muyi amfani da aikin tarawa tare da yanayin tacewa dangane da kumburi don rage saitin maɓalli:
WITH RECURSIVE T AS (
SELECT
array_agg(id) id$
, array_agg(id) FILTER(WHERE branch) ns$
FROM
hier
WHERE
id = 5300
UNION ALL
SELECT
X.*
FROM
T
JOIN LATERAL (
SELECT
array_agg(id) id$
, array_agg(id) FILTER(WHERE branch) ns$
FROM
hier
WHERE
pid = ANY(T.ns$)
) X
ON coalesce(T.ns$, '{}') <> '{}'
)
SELECT
unnest(id$) id
FROM
T;
Mun sami damar rage ƙarin kira guda ɗaya kuma lashe fiye da sau 2 a cikin girma karantawa.
#2. Mu koma ga tushen
Wannan algorithm zai zama da amfani idan kuna buƙatar tattara bayanai don duk abubuwan "a saman bishiyar", yayin da kuke riƙe bayanai game da wane takarda tushe (kuma tare da waɗanne alamomi) ya sa aka haɗa shi a cikin samfurin - alal misali, don samar da rahoton taƙaitaccen bayani. tare da tarawa cikin nodes.
Abin da ya biyo baya yakamata a dauki shi azaman hujja kawai, tunda buƙatar ta zama mai wahala. Amma idan ya mamaye bayananku, yakamata kuyi tunanin amfani da dabaru iri ɗaya.
Bari mu fara da wasu kalmomi masu sauƙi:
- Rikodin iri ɗaya daga bayanan bayanai Zai fi kyau a karanta shi sau ɗaya kawai.
- Rubuce-rubuce daga database Ya fi dacewa don karantawa cikin batchesfiye da kadai.
Yanzu bari mu yi ƙoƙarin gina buƙatar da muke buƙata.
Mataki 1
Babu shakka, lokacin fara sake dawowa (inda za mu kasance ba tare da shi ba!) Dole ne mu cire bayanan ganyen da kansu bisa tsarin abubuwan ganowa na farko:
WITH RECURSIVE tree AS (
SELECT
rec -- это цельная запись таблицы
, id::text chld -- это "набор" приведших сюда исходных листьев
FROM
hier rec
WHERE
id = ANY('{1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192}'::integer[])
UNION ALL
...
Idan ya zama kamar baƙon abu ga wani cewa an adana "saitin" azaman kirtani kuma ba tsararru ba, to akwai bayani mai sauƙi akan wannan. Akwai ginanniyar aikin “gluing” mai haɗawa don kirtani string_agg
, amma ba don tsararru ba. Ko da yake ita
Mataki 2
Yanzu za mu sami saitin ID na sashe waɗanda za a buƙaci ƙarin karantawa. Kusan koyaushe za a kwafi su a cikin bayanan asali daban-daban - don haka za mu yi rukuni su, yayin adana bayanai game da tushen ganye.
Amma a nan matsaloli guda uku suna jiran mu:
- Bangaren "subrecursive" na tambayar ba zai iya ƙunsar tara ayyuka tare da
GROUP BY
. - Tunanin “tebur” mai maimaitawa ba zai iya kasancewa a cikin gidan da aka gina ba.
- Buƙatun a ɓangaren maimaitawa ba zai iya ƙunsar CTE ba.
Abin farin ciki, duk waɗannan matsalolin suna da sauƙin aiki a kusa. Bari mu fara daga ƙarshe.
CTE a cikin recursive part
Kamar wannan ba yana aiki:
WITH RECURSIVE tree AS (
...
UNION ALL
WITH T (...)
SELECT ...
)
Kuma don haka yana aiki, ƙididdiga suna yin bambanci!
WITH RECURSIVE tree AS (
...
UNION ALL
(
WITH T (...)
SELECT ...
)
)
Nested tambaya akan "tebur" mai maimaitawa
Hmm... Ba za a iya isa ga CTE mai maimaitawa ba a cikin abin da ke biyo baya. Amma yana iya zama a cikin CTE! Kuma buƙatun ƙira na iya samun dama ga wannan CTE!
GROUP BY ciki maimaituwa
Ba shi da daɗi, amma ... Muna da hanya mai sauƙi don yin koyi da GROUP TA hanyar amfani DISTINCT ON
da ayyukan taga!
SELECT
(rec).pid id
, string_agg(chld::text, ',') chld
FROM
tree
WHERE
(rec).pid IS NOT NULL
GROUP BY 1 -- не работает!
Kuma wannan shine yadda yake aiki!
SELECT DISTINCT ON((rec).pid)
(rec).pid id
, string_agg(chld::text, ',') OVER(PARTITION BY (rec).pid) chld
FROM
tree
WHERE
(rec).pid IS NOT NULL
Yanzu mun ga dalilin da ya sa aka mayar da lambar ID ɗin rubutu - don a haɗa su tare da waƙafi!
Mataki 3
Ga wasan karshe ba mu da abin da ya rage:
- mun karanta bayanan "sashe" bisa tsarin ID na rukuni
- muna kwatanta sassan da aka cire tare da "saitin" na ainihin zanen gado
- “fadada” saitin kirtani ta amfani da
unnest(string_to_array(chld, ',')::integer[])
WITH RECURSIVE tree AS (
SELECT
rec
, id::text chld
FROM
hier rec
WHERE
id = ANY('{1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192}'::integer[])
UNION ALL
(
WITH prnt AS (
SELECT DISTINCT ON((rec).pid)
(rec).pid id
, string_agg(chld::text, ',') OVER(PARTITION BY (rec).pid) chld
FROM
tree
WHERE
(rec).pid IS NOT NULL
)
, nodes AS (
SELECT
rec
FROM
hier rec
WHERE
id = ANY(ARRAY(
SELECT
id
FROM
prnt
))
)
SELECT
nodes.rec
, prnt.chld
FROM
prnt
JOIN
nodes
ON (nodes.rec).id = prnt.id
)
)
SELECT
unnest(string_to_array(chld, ',')::integer[]) leaf
, (rec).*
FROM
tree;
source: www.habr.com