PostgreSQL Antipatterns: bari mu buga nauyi JOIN tare da ƙamus

Muna ci gaba da jerin labaran da aka sadaukar don nazarin hanyoyin da ba a san su ba don inganta ayyukan "da alama mai sauƙi" tambayoyin PostgreSQL:

Kar ki yi tunanin bana son JOIN sosai... :)

Amma sau da yawa ba tare da shi ba, buƙatar ta zama mafi inganci fiye da ita. Don haka a yau za mu gwada kawar da m albarkatun JOIN - amfani da ƙamus.

PostgreSQL Antipatterns: bari mu buga nauyi JOIN tare da ƙamus

Farawa tare da PostgreSQL 12, wasu daga cikin yanayin da aka kwatanta a ƙasa na iya sake haifar da ɗan bambanci daban-daban saboda Tsohuwar rashin kayan aiki CTE. Ana iya mayar da wannan halin ta hanyar tantance maɓalli MATERIALIZED.

Yawancin “gaskiya” a cikin ƙayyadaddun ƙamus

Bari mu ɗauki aikin aikace-aikace na gaske - muna buƙatar nuna lissafi saƙonni masu shigowa ko ayyuka masu aiki tare da masu aikawa:

25.01 | Иванов И.И. | Подготовить описание нового алгоритма.
22.01 | Иванов И.И. | Написать статью на Хабр: жизнь без JOIN.
20.01 | Петров П.П. | Помочь оптимизировать запрос.
18.01 | Иванов И.И. | Написать статью на Хабр: JOIN с учетом распределения данных.
16.01 | Петров П.П. | Помочь оптимизировать запрос.

A cikin duniyar m, ya kamata a rarraba mawallafin ɗawainiya a tsakanin duk ma'aikatan ƙungiyarmu, amma a zahiri ayyuka suna zuwa, a matsayin mai mulkin, daga ƙayyadaddun adadin mutane - "daga gudanarwa" har zuwa matsayi ko "daga 'yan kwangila" daga sassan makwabta (masana, masu zane-zane, tallace-tallace, ...).

Bari mu yarda cewa a cikin ƙungiyarmu na mutane 1000, marubuta 20 ne kawai (yawanci ma ƙasa da ƙasa) sun tsara ayyuka ga kowane takamaiman mai yin aiki da Mu yi amfani da ilimin wannan batudon hanzarta tambayar "gargajiya".

janareta na rubutu

-- сотрудники
CREATE TABLE person AS
SELECT
  id
, repeat(chr(ascii('a') + (id % 26)), (id % 32) + 1) "name"
, '2000-01-01'::date - (random() * 1e4)::integer birth_date
FROM
  generate_series(1, 1000) id;

ALTER TABLE person ADD PRIMARY KEY(id);

-- задачи с указанным распределением
CREATE TABLE task AS
WITH aid AS (
  SELECT
    id
  , array_agg((random() * 999)::integer + 1) aids
  FROM
    generate_series(1, 1000) id
  , generate_series(1, 20)
  GROUP BY
    1
)
SELECT
  *
FROM
  (
    SELECT
      id
    , '2020-01-01'::date - (random() * 1e3)::integer task_date
    , (random() * 999)::integer + 1 owner_id
    FROM
      generate_series(1, 100000) id
  ) T
, LATERAL(
    SELECT
      aids[(random() * (array_length(aids, 1) - 1))::integer + 1] author_id
    FROM
      aid
    WHERE
      id = T.owner_id
    LIMIT 1
  ) a;

ALTER TABLE task ADD PRIMARY KEY(id);
CREATE INDEX ON task(owner_id, task_date);
CREATE INDEX ON task(author_id);

Bari mu nuna ayyuka 100 na ƙarshe don takamaiman mai zartarwa:

SELECT
  task.*
, person.name
FROM
  task
LEFT JOIN
  person
    ON person.id = task.author_id
WHERE
  owner_id = 777
ORDER BY
  task_date DESC
LIMIT 100;

PostgreSQL Antipatterns: bari mu buga nauyi JOIN tare da ƙamus
[duba bayanin.tensor.ru]

Sai dai itace cewa 1/3 jimlar lokaci da 3/4 karatu An yi shafukan bayanai ne kawai don neman marubucin sau 100 - don kowane aikin fitarwa. Amma mun san cewa a cikin waɗannan ɗaruruwan kawai 20 daban-daban - Shin yana yiwuwa a yi amfani da wannan ilimin?

hstore-kamus

Mu yi amfani hstore irin don samar da maɓalli mai mahimmanci "kamus":

CREATE EXTENSION hstore

Muna buƙatar kawai sanya ID na marubucin da sunansa a cikin ƙamus domin mu ciro ta amfani da wannan maɓallin:

-- формируем целевую выборку
WITH T AS (
  SELECT
    *
  FROM
    task
  WHERE
    owner_id = 777
  ORDER BY
    task_date DESC
  LIMIT 100
)
-- формируем словарь для уникальных значений
, dict AS (
  SELECT
    hstore( -- hstore(keys::text[], values::text[])
      array_agg(id)::text[]
    , array_agg(name)::text[]
    )
  FROM
    person
  WHERE
    id = ANY(ARRAY(
      SELECT DISTINCT
        author_id
      FROM
        T
    ))
)
-- получаем связанные значения словаря
SELECT
  *
, (TABLE dict) -> author_id::text -- hstore -> key
FROM
  T;

PostgreSQL Antipatterns: bari mu buga nauyi JOIN tare da ƙamus
[duba bayanin.tensor.ru]

An kashe don samun bayanai game da mutane Sau 2 ƙasa da lokaci kuma sau 7 ƙasa karanta bayanai! Baya ga “kalmomi”, abin da kuma ya taimaka mana wajen cimma waɗannan sakamakon shi ne babban rikodin dawo da daga tebur a cikin wucewa ɗaya ta amfani da = ANY(ARRAY(...)).

Shigar da tebur: Serialization da Deserialization

Amma idan muna buƙatar ajiye ba kawai filin rubutu ɗaya ba, amma gaba ɗaya shigarwa a cikin ƙamus? A wannan yanayin, ikon PostgreSQL zai taimake mu bi shigarwar tebur azaman ƙima ɗaya:

...
, dict AS (
  SELECT
    hstore(
      array_agg(id)::text[]
    , array_agg(p)::text[] -- магия #1
    )
  FROM
    person p
  WHERE
    ...
)
SELECT
  *
, (((TABLE dict) -> author_id::text)::person).* -- магия #2
FROM
  T;

Bari mu ga abin da ke faruwa a nan:

  1. Mun dauka p a matsayin laƙabi ga cikakken mutum tebur shigarwa kuma ya tattara jerin su.
  2. wannan An sake yin rikodi da yawa zuwa tsararrun zaren rubutu (mutum[]:: rubutu[]) don sanya shi a cikin ƙamus na hstore a matsayin tsararrun ƙima.
  3. Lokacin da muka karɓi rikodin mai alaƙa, mu cire daga ƙamus ta maɓalli a matsayin igiyar rubutu.
  4. Muna buƙatar rubutu juya zuwa darajar nau'in tebur mutum (ga kowane tebur nau'in suna iri ɗaya ana ƙirƙira ta atomatik).
  5. "Expand" da aka buga rikodin cikin ginshiƙai ta amfani da (...).*.

json kamus

Amma irin wannan dabara kamar yadda muka yi amfani da shi a sama ba zai yi aiki ba idan babu nau'in tebur mai dacewa don yin "simintin gyaran kafa". Daidai wannan yanayin zai tashi, kuma idan muka yi ƙoƙarin amfani da shi Layin CTE, ba tebur na "ainihin"..

A wannan yanayin za su taimake mu ayyuka don aiki tare da json:

...
, p AS ( -- это уже CTE
  SELECT
    *
  FROM
    person
  WHERE
    ...
)
, dict AS (
  SELECT
    json_object( -- теперь это уже json
      array_agg(id)::text[]
    , array_agg(row_to_json(p))::text[] -- и внутри json для каждой строки
    )
  FROM
    p
)
SELECT
  *
FROM
  T
, LATERAL(
    SELECT
      *
    FROM
      json_to_record(
        ((TABLE dict) ->> author_id::text)::json -- извлекли из словаря как json
      ) AS j(name text, birth_date date) -- заполнили нужную нам структуру
  ) j;

Ya kamata a lura cewa lokacin da aka kwatanta tsarin da aka yi niyya, ba za mu iya lissafa duk filayen tushen kirtani ba, amma kawai waɗanda muke buƙata. Idan muna da tebur "na ƙasa", to yana da kyau a yi amfani da aikin json_populate_record.

Har yanzu muna samun damar ƙamus sau ɗaya, amma json-[de] farashin serialization suna da yawa sosai, don haka, yana da kyau a yi amfani da wannan hanya kawai a wasu lokuta lokacin da "masu gaskiya" CTE Scan ya nuna kansa mafi muni.

Gwajin aikin

Don haka, mun sami hanyoyi guda biyu don tsara bayanai a cikin ƙamus - hstore/json_object. Bugu da kari, tsarin maɓalli da ƙimar su kuma za a iya samar da su ta hanyoyi biyu, tare da juyawa na ciki ko na waje zuwa rubutu: array_agg(i:: rubutu) / array_agg(i): rubutu[].

Bari mu bincika tasirin nau'ikan serialization daban-daban ta amfani da misali na roba zalla - jera lambobin maɓallai daban-daban:

WITH dict AS (
  SELECT
    hstore(
      array_agg(i::text)
    , array_agg(i::text)
    )
  FROM
    generate_series(1, ...) i
)
TABLE dict;

Rubutun kimantawa: serialization

WITH T AS (
  SELECT
    *
  , (
      SELECT
        regexp_replace(ea[array_length(ea, 1)], '^Execution Time: (d+.d+) ms$', '1')::real et
      FROM
        (
          SELECT
            array_agg(el) ea
          FROM
            dblink('port= ' || current_setting('port') || ' dbname=' || current_database(), $$
              explain analyze
              WITH dict AS (
                SELECT
                  hstore(
                    array_agg(i::text)
                  , array_agg(i::text)
                  )
                FROM
                  generate_series(1, $$ || (1 << v) || $$) i
              )
              TABLE dict
            $$) T(el text)
        ) T
    ) et
  FROM
    generate_series(0, 19) v
  ,   LATERAL generate_series(1, 7) i
  ORDER BY
    1, 2
)
SELECT
  v
, avg(et)::numeric(32,3)
FROM
  T
GROUP BY
  1
ORDER BY
  1;

PostgreSQL Antipatterns: bari mu buga nauyi JOIN tare da ƙamus

A kan PostgreSQL 11, har zuwa kusan girman ƙamus na maɓallai 2^12 serialization zuwa json yana ɗaukar ɗan lokaci kaɗan. A wannan yanayin, mafi inganci shine haɗin json_object da nau'in juzu'i na "ciki". array_agg(i::text).

Yanzu bari mu yi ƙoƙari mu karanta ƙimar kowane maɓalli sau 8 - bayan haka, idan ba ku sami damar shiga ƙamus ba, to me yasa ake buƙata?

Rubutun kimantawa: karantawa daga ƙamus

WITH T AS (
  SELECT
    *
  , (
      SELECT
        regexp_replace(ea[array_length(ea, 1)], '^Execution Time: (d+.d+) ms$', '1')::real et
      FROM
        (
          SELECT
            array_agg(el) ea
          FROM
            dblink('port= ' || current_setting('port') || ' dbname=' || current_database(), $$
              explain analyze
              WITH dict AS (
                SELECT
                  json_object(
                    array_agg(i::text)
                  , array_agg(i::text)
                  )
                FROM
                  generate_series(1, $$ || (1 << v) || $$) i
              )
              SELECT
                (TABLE dict) -> (i % ($$ || (1 << v) || $$) + 1)::text
              FROM
                generate_series(1, $$ || (1 << (v + 3)) || $$) i
            $$) T(el text)
        ) T
    ) et
  FROM
    generate_series(0, 19) v
  , LATERAL generate_series(1, 7) i
  ORDER BY
    1, 2
)
SELECT
  v
, avg(et)::numeric(32,3)
FROM
  T
GROUP BY
  1
ORDER BY
  1;

PostgreSQL Antipatterns: bari mu buga nauyi JOIN tare da ƙamus

Kuma ... riga kusan tare da maɓallan 2^6, karantawa daga ƙamus na json ya fara rasa sau da yawa karanta daga hstore, don jsonb haka yana faruwa a 2^9.

Ƙarshe na ƙarshe:

  • idan kana bukatar ka yi JOIN tare da maimaita maimaitawa - yana da kyau a yi amfani da "kamus" na tebur
  • idan ana sa ran ƙamus ɗin ku ƙarami kuma ba za ku karanta da yawa daga ciki ba Kuna iya amfani da json[b]
  • a duk sauran lokuta hstore + array_agg (i :: rubutu) zai fi tasiri

source: www.habr.com

Add a comment