Amma sau da yawa ba tare da shi ba, buƙatar ta zama mafi inganci fiye da ita. Don haka a yau za mu gwada kawar da m albarkatun JOIN - amfani da ƙamus.
Farawa tare da PostgreSQL 12, wasu daga cikin yanayin da aka kwatanta a ƙasa na iya sake haifar da ɗan bambanci daban-daban saboda Tsohuwar rashin kayan aiki CTE. Ana iya mayar da wannan halin ta hanyar tantance maɓalli MATERIALIZED.
Yawancin “gaskiya” a cikin ƙayyadaddun ƙamus
Bari mu ɗauki aikin aikace-aikace na gaske - muna buƙatar nuna lissafi saƙonni masu shigowa ko ayyuka masu aiki tare da masu aikawa:
25.01 | Иванов И.И. | Подготовить описание нового алгоритма.
22.01 | Иванов И.И. | Написать статью на Хабр: жизнь без JOIN.
20.01 | Петров П.П. | Помочь оптимизировать запрос.
18.01 | Иванов И.И. | Написать статью на Хабр: JOIN с учетом распределения данных.
16.01 | Петров П.П. | Помочь оптимизировать запрос.
A cikin duniyar m, ya kamata a rarraba mawallafin ɗawainiya a tsakanin duk ma'aikatan ƙungiyarmu, amma a zahiri ayyuka suna zuwa, a matsayin mai mulkin, daga ƙayyadaddun adadin mutane - "daga gudanarwa" har zuwa matsayi ko "daga 'yan kwangila" daga sassan makwabta (masana, masu zane-zane, tallace-tallace, ...).
Bari mu yarda cewa a cikin ƙungiyarmu na mutane 1000, marubuta 20 ne kawai (yawanci ma ƙasa da ƙasa) sun tsara ayyuka ga kowane takamaiman mai yin aiki da Mu yi amfani da ilimin wannan batudon hanzarta tambayar "gargajiya".
janareta na rubutu
-- сотрудники
CREATE TABLE person AS
SELECT
id
, repeat(chr(ascii('a') + (id % 26)), (id % 32) + 1) "name"
, '2000-01-01'::date - (random() * 1e4)::integer birth_date
FROM
generate_series(1, 1000) id;
ALTER TABLE person ADD PRIMARY KEY(id);
-- задачи с указанным распределением
CREATE TABLE task AS
WITH aid AS (
SELECT
id
, array_agg((random() * 999)::integer + 1) aids
FROM
generate_series(1, 1000) id
, generate_series(1, 20)
GROUP BY
1
)
SELECT
*
FROM
(
SELECT
id
, '2020-01-01'::date - (random() * 1e3)::integer task_date
, (random() * 999)::integer + 1 owner_id
FROM
generate_series(1, 100000) id
) T
, LATERAL(
SELECT
aids[(random() * (array_length(aids, 1) - 1))::integer + 1] author_id
FROM
aid
WHERE
id = T.owner_id
LIMIT 1
) a;
ALTER TABLE task ADD PRIMARY KEY(id);
CREATE INDEX ON task(owner_id, task_date);
CREATE INDEX ON task(author_id);
Bari mu nuna ayyuka 100 na ƙarshe don takamaiman mai zartarwa:
SELECT
task.*
, person.name
FROM
task
LEFT JOIN
person
ON person.id = task.author_id
WHERE
owner_id = 777
ORDER BY
task_date DESC
LIMIT 100;
Sai dai itace cewa 1/3 jimlar lokaci da 3/4 karatu An yi shafukan bayanai ne kawai don neman marubucin sau 100 - don kowane aikin fitarwa. Amma mun san cewa a cikin waɗannan ɗaruruwan kawai 20 daban-daban - Shin yana yiwuwa a yi amfani da wannan ilimin?
hstore-kamus
Mu yi amfani hstore irin don samar da maɓalli mai mahimmanci "kamus":
CREATE EXTENSION hstore
Muna buƙatar kawai sanya ID na marubucin da sunansa a cikin ƙamus domin mu ciro ta amfani da wannan maɓallin:
-- формируем целевую выборку
WITH T AS (
SELECT
*
FROM
task
WHERE
owner_id = 777
ORDER BY
task_date DESC
LIMIT 100
)
-- формируем словарь для уникальных значений
, dict AS (
SELECT
hstore( -- hstore(keys::text[], values::text[])
array_agg(id)::text[]
, array_agg(name)::text[]
)
FROM
person
WHERE
id = ANY(ARRAY(
SELECT DISTINCT
author_id
FROM
T
))
)
-- получаем связанные значения словаря
SELECT
*
, (TABLE dict) -> author_id::text -- hstore -> key
FROM
T;
An kashe don samun bayanai game da mutane Sau 2 ƙasa da lokaci kuma sau 7 ƙasa karanta bayanai! Baya ga “kalmomi”, abin da kuma ya taimaka mana wajen cimma waɗannan sakamakon shi ne babban rikodin dawo da daga tebur a cikin wucewa ɗaya ta amfani da = ANY(ARRAY(...)).
Shigar da tebur: Serialization da Deserialization
Amma idan muna buƙatar ajiye ba kawai filin rubutu ɗaya ba, amma gaba ɗaya shigarwa a cikin ƙamus? A wannan yanayin, ikon PostgreSQL zai taimake mu bi shigarwar tebur azaman ƙima ɗaya:
...
, dict AS (
SELECT
hstore(
array_agg(id)::text[]
, array_agg(p)::text[] -- магия #1
)
FROM
person p
WHERE
...
)
SELECT
*
, (((TABLE dict) -> author_id::text)::person).* -- магия #2
FROM
T;
Bari mu ga abin da ke faruwa a nan:
Mun dauka p a matsayin laƙabi ga cikakken mutum tebur shigarwa kuma ya tattara jerin su.
wannan An sake yin rikodi da yawa zuwa tsararrun zaren rubutu (mutum[]:: rubutu[]) don sanya shi a cikin ƙamus na hstore a matsayin tsararrun ƙima.
Lokacin da muka karɓi rikodin mai alaƙa, mu cire daga ƙamus ta maɓalli a matsayin igiyar rubutu.
Muna buƙatar rubutu juya zuwa darajar nau'in tebur mutum (ga kowane tebur nau'in suna iri ɗaya ana ƙirƙira ta atomatik).
"Expand" da aka buga rikodin cikin ginshiƙai ta amfani da (...).*.
json kamus
Amma irin wannan dabara kamar yadda muka yi amfani da shi a sama ba zai yi aiki ba idan babu nau'in tebur mai dacewa don yin "simintin gyaran kafa". Daidai wannan yanayin zai tashi, kuma idan muka yi ƙoƙarin amfani da shi Layin CTE, ba tebur na "ainihin"..
...
, p AS ( -- это уже CTE
SELECT
*
FROM
person
WHERE
...
)
, dict AS (
SELECT
json_object( -- теперь это уже json
array_agg(id)::text[]
, array_agg(row_to_json(p))::text[] -- и внутри json для каждой строки
)
FROM
p
)
SELECT
*
FROM
T
, LATERAL(
SELECT
*
FROM
json_to_record(
((TABLE dict) ->> author_id::text)::json -- извлекли из словаря как json
) AS j(name text, birth_date date) -- заполнили нужную нам структуру
) j;
Ya kamata a lura cewa lokacin da aka kwatanta tsarin da aka yi niyya, ba za mu iya lissafa duk filayen tushen kirtani ba, amma kawai waɗanda muke buƙata. Idan muna da tebur "na ƙasa", to yana da kyau a yi amfani da aikin json_populate_record.
Har yanzu muna samun damar ƙamus sau ɗaya, amma json-[de] farashin serialization suna da yawa sosai, don haka, yana da kyau a yi amfani da wannan hanya kawai a wasu lokuta lokacin da "masu gaskiya" CTE Scan ya nuna kansa mafi muni.
Gwajin aikin
Don haka, mun sami hanyoyi guda biyu don tsara bayanai a cikin ƙamus - hstore/json_object. Bugu da kari, tsarin maɓalli da ƙimar su kuma za a iya samar da su ta hanyoyi biyu, tare da juyawa na ciki ko na waje zuwa rubutu: array_agg(i:: rubutu) / array_agg(i): rubutu[].
Bari mu bincika tasirin nau'ikan serialization daban-daban ta amfani da misali na roba zalla - jera lambobin maɓallai daban-daban:
WITH dict AS (
SELECT
hstore(
array_agg(i::text)
, array_agg(i::text)
)
FROM
generate_series(1, ...) i
)
TABLE dict;
Rubutun kimantawa: serialization
WITH T AS (
SELECT
*
, (
SELECT
regexp_replace(ea[array_length(ea, 1)], '^Execution Time: (d+.d+) ms$', '1')::real et
FROM
(
SELECT
array_agg(el) ea
FROM
dblink('port= ' || current_setting('port') || ' dbname=' || current_database(), $$
explain analyze
WITH dict AS (
SELECT
hstore(
array_agg(i::text)
, array_agg(i::text)
)
FROM
generate_series(1, $$ || (1 << v) || $$) i
)
TABLE dict
$$) T(el text)
) T
) et
FROM
generate_series(0, 19) v
, LATERAL generate_series(1, 7) i
ORDER BY
1, 2
)
SELECT
v
, avg(et)::numeric(32,3)
FROM
T
GROUP BY
1
ORDER BY
1;
A kan PostgreSQL 11, har zuwa kusan girman ƙamus na maɓallai 2^12 serialization zuwa json yana ɗaukar ɗan lokaci kaɗan. A wannan yanayin, mafi inganci shine haɗin json_object da nau'in juzu'i na "ciki". array_agg(i::text).
Yanzu bari mu yi ƙoƙari mu karanta ƙimar kowane maɓalli sau 8 - bayan haka, idan ba ku sami damar shiga ƙamus ba, to me yasa ake buƙata?
Rubutun kimantawa: karantawa daga ƙamus
WITH T AS (
SELECT
*
, (
SELECT
regexp_replace(ea[array_length(ea, 1)], '^Execution Time: (d+.d+) ms$', '1')::real et
FROM
(
SELECT
array_agg(el) ea
FROM
dblink('port= ' || current_setting('port') || ' dbname=' || current_database(), $$
explain analyze
WITH dict AS (
SELECT
json_object(
array_agg(i::text)
, array_agg(i::text)
)
FROM
generate_series(1, $$ || (1 << v) || $$) i
)
SELECT
(TABLE dict) -> (i % ($$ || (1 << v) || $$) + 1)::text
FROM
generate_series(1, $$ || (1 << (v + 3)) || $$) i
$$) T(el text)
) T
) et
FROM
generate_series(0, 19) v
, LATERAL generate_series(1, 7) i
ORDER BY
1, 2
)
SELECT
v
, avg(et)::numeric(32,3)
FROM
T
GROUP BY
1
ORDER BY
1;
Kuma ... riga kusan tare da maɓallan 2^6, karantawa daga ƙamus na json ya fara rasa sau da yawa karanta daga hstore, don jsonb haka yana faruwa a 2^9.
Ƙarshe na ƙarshe:
idan kana bukatar ka yi JOIN tare da maimaita maimaitawa - yana da kyau a yi amfani da "kamus" na tebur
idan ana sa ran ƙamus ɗin ku ƙarami kuma ba za ku karanta da yawa daga ciki ba Kuna iya amfani da json[b]
a duk sauran lokuta hstore + array_agg (i :: rubutu) zai fi tasiri