Ama-Antipatterns e-PostgreSQL: ake sishaye i-JOIN esindayo ngesichazamazwi

Siqhubeka nochungechunge lwama-athikili anikelwe ocwaningweni lwezindlela ezaziwa kancane zokuthuthukisa ukusebenza kwemibuzo “ebonakala ilula” ye-PostgreSQL:

Ungacabangi ukuthi angithandi JOIN kakhulu... :)

Kodwa ngokuvamile ngaphandle kwayo, isicelo sibonakala sikhiqiza kakhulu kunaso. Ngakho namuhla sizozama susa i-JOIN edinga izinsiza - kusetshenziswa isichazamazwi.

Ama-Antipatterns e-PostgreSQL: ake sishaye i-JOIN esindayo ngesichazamazwi

Ukuqala nge-PostgreSQL 12, ezinye zezimo ezichazwe ngezansi zingaphinda zenziwe ngendlela ehlukile ngenxa I-CTE yokungagcini ngokuzenzakalelayo. Lokhu kuziphatha kungabuyiselwa ngokucacisa ukhiye MATERIALIZED.

“Amaqiniso” amaningi kumagama alinganiselwe

Ake sithathe umsebenzi wohlelo lokusebenza wangempela - sidinga ukubonisa uhlu imilayezo engenayo noma imisebenzi esebenzayo enabathumeli:

25.01 | Иванов И.И. | Подготовить описание нового алгоритма.
22.01 | Иванов И.И. | Написать статью на Хабр: жизнь без JOIN.
20.01 | Петров П.П. | Помочь оптимизировать запрос.
18.01 | Иванов И.И. | Написать статью на Хабр: JOIN с учетом распределения данных.
16.01 | Петров П.П. | Помочь оптимизировать запрос.

Emhlabeni ongaqondakali, ababhali bemisebenzi kufanele basatshalaliswe ngokulinganayo kubo bonke abasebenzi benhlangano yethu, kodwa empeleni imisebenzi ivela, njengomthetho, ivela enanini elilinganiselwe labantu - “kusuka kubaphathi” kukhuphuke ngezinga eliphezulu noma “kusuka kosonkontileka abancane” eminyangweni engomakhelwane (abahlaziyi, abaklami, ezokuthengisa, ...).

Masikwamukele ukuthi enhlanganweni yethu yabantu abayi-1000, ababhali abangama-20 kuphela (imvamisa ngisho nangaphansi) babeka imisebenzi yomdlali ngamunye futhi Masisebenzise lolu lwazi lwesifundoukusheshisa umbuzo "wendabuko".

Ijeneretha yesikripthi

-- сотрудники
CREATE TABLE person AS
SELECT
  id
, repeat(chr(ascii('a') + (id % 26)), (id % 32) + 1) "name"
, '2000-01-01'::date - (random() * 1e4)::integer birth_date
FROM
  generate_series(1, 1000) id;

ALTER TABLE person ADD PRIMARY KEY(id);

-- задачи с указанным распределением
CREATE TABLE task AS
WITH aid AS (
  SELECT
    id
  , array_agg((random() * 999)::integer + 1) aids
  FROM
    generate_series(1, 1000) id
  , generate_series(1, 20)
  GROUP BY
    1
)
SELECT
  *
FROM
  (
    SELECT
      id
    , '2020-01-01'::date - (random() * 1e3)::integer task_date
    , (random() * 999)::integer + 1 owner_id
    FROM
      generate_series(1, 100000) id
  ) T
, LATERAL(
    SELECT
      aids[(random() * (array_length(aids, 1) - 1))::integer + 1] author_id
    FROM
      aid
    WHERE
      id = T.owner_id
    LIMIT 1
  ) a;

ALTER TABLE task ADD PRIMARY KEY(id);
CREATE INDEX ON task(owner_id, task_date);
CREATE INDEX ON task(author_id);

Ake sibonise imisebenzi yokugcina eyi-100 yomabi wefa othize:

SELECT
  task.*
, person.name
FROM
  task
LEFT JOIN
  person
    ON person.id = task.author_id
WHERE
  owner_id = 777
ORDER BY
  task_date DESC
LIMIT 100;

Ama-Antipatterns e-PostgreSQL: ake sishaye i-JOIN esindayo ngesichazamazwi
[buka kokuthi explain.tensor.ru]

Kuvela lokho 1/3 ingqikithi yesikhathi nokufundwa okungu-3/4 amakhasi edatha enziwe kuphela ukucinga umbhali izikhathi eziyi-100 - ngomsebenzi ngamunye ophumayo. Kodwa siyazi ukuthi phakathi kwalamakhulu 20 kuphela ezahlukene - Kungenzeka yini ukusebenzisa lolu lwazi?

hstore-isichazamazwi

Asisebenzise ithuba uhlobo lwe-hstore ukwenza “isichazamazwi” senani elingukhiye:

CREATE EXTENSION hstore

Sidinga nje ukufaka i-ID yombhali negama lakhe kusichazamazwi ukuze sikhiphe sisebenzisa lo khiye:

-- формируем целевую выборку
WITH T AS (
  SELECT
    *
  FROM
    task
  WHERE
    owner_id = 777
  ORDER BY
    task_date DESC
  LIMIT 100
)
-- формируем словарь для уникальных значений
, dict AS (
  SELECT
    hstore( -- hstore(keys::text[], values::text[])
      array_agg(id)::text[]
    , array_agg(name)::text[]
    )
  FROM
    person
  WHERE
    id = ANY(ARRAY(
      SELECT DISTINCT
        author_id
      FROM
        T
    ))
)
-- получаем связанные значения словаря
SELECT
  *
, (TABLE dict) -> author_id::text -- hstore -> key
FROM
  T;

Ama-Antipatterns e-PostgreSQL: ake sishaye i-JOIN esindayo ngesichazamazwi
[buka kokuthi explain.tensor.ru]

Ichithwe ekutholeni ulwazi ngabantu Izikhathi ezi-2 isikhathi esincane kanye nedatha encane ephindwe ka-7! Ngaphezu “kolwazimagama”, okuphinde kwasisiza ukuthi sifinyelele le miphumela kwaba ukubuyiswa kwerekhodi ngobuningi kusuka etafuleni ngephasi eyodwa usebenzisa = ANY(ARRAY(...)).

Okufakiwe Kwethebula: I-Serialization kanye Ne-Deserialization

Kodwa kuthiwani uma sidinga ukulondoloza hhayi inkambu yombhalo eyodwa kuphela, kodwa konke okufakile kusichazamazwi? Kulokhu, ikhono le-PostgreSQL lizosisiza phatha okufakiwe kwethebula njengenani elilodwa:

...
, dict AS (
  SELECT
    hstore(
      array_agg(id)::text[]
    , array_agg(p)::text[] -- магия #1
    )
  FROM
    person p
  WHERE
    ...
)
SELECT
  *
, (((TABLE dict) -> author_id::text)::person).* -- магия #2
FROM
  T;

Ake sibheke ukuthi kwakwenzekani lapha:

  1. Sathatha p njengesibizo sokungena kwetafula lomuntu ogcwele wabutha inqwaba yazo.
  2. lokhu uhlu lokurekhodiwe lwaphindwa ohlwini lwamayunithi ezinhlamvu zombhalo (umuntu[]::umbhalo[]) ukuze awubeke kusichazamazwi se-hstore njengohlu lwamanani.
  3. Lapho sithola irekhodi elihlobene, thina idonswe kusichazamazwi ngokhiye njengeyunithi yezinhlamvu yombhalo.
  4. Sidinga umbhalo shintsha ube yinani lohlobo lwethebula umuntu (kuthebula ngalinye uhlobo lwegama elifanayo ludalwa ngokuzenzakalelayo).
  5. “Nweba” irekhodi elithayiphiwe libe amakholomu usebenzisa (...).*.

json isichazamazwi

Kodwa iqhinga elinjalo njengoba sisebenzise ngenhla ngeke lisebenze uma lungekho uhlobo lwethebula oluhambisanayo lokwenza "ukusakaza". Impela isimo esifanayo sizovela, futhi uma sizama ukusebenzisa umugqa we-CTE, hhayi ithebula "langempela"..

Kulokhu bazosisiza imisebenzi yokusebenza ne-json:

...
, p AS ( -- это уже CTE
  SELECT
    *
  FROM
    person
  WHERE
    ...
)
, dict AS (
  SELECT
    json_object( -- теперь это уже json
      array_agg(id)::text[]
    , array_agg(row_to_json(p))::text[] -- и внутри json для каждой строки
    )
  FROM
    p
)
SELECT
  *
FROM
  T
, LATERAL(
    SELECT
      *
    FROM
      json_to_record(
        ((TABLE dict) ->> author_id::text)::json -- извлекли из словаря как json
      ) AS j(name text, birth_date date) -- заполнили нужную нам структуру
  ) j;

Kufanele kuqashelwe ukuthi lapho sichaza isakhiwo esiqondiwe, asikwazi ukuklelisa zonke izinkambu zochungechunge lomthombo, kodwa kuphela lezo esizidinga ngempela. Uma sinetafula "lomdabu", ngakho-ke kungcono ukusebenzisa umsebenzi json_populate_record.

Sisafinyelela isichazamazwi kanye, kodwa json-[de] izindleko ze-serialization ziphezulu kakhulu, ngakho-ke, kunengqondo ukusebenzisa le ndlela kuphela kwezinye izimo lapho i-CTE Scan "ethembekile" ibonakala imbi kakhulu.

Ukuhlola ukusebenza

Ngakho-ke, sithole izindlela ezimbili zokulinganisa idatha kwisichazamazwi − hstore/json_object. Ngaphezu kwalokho, ama-arrays okhiye namanani ngokwawo nawo angakhiqizwa ngezindlela ezimbili, ngokuguqulwa kwangaphakathi noma kwangaphandle kube umbhalo: i-array_agg(i::umbhalo) / i-array_agg(i)::umbhalo[].

Ake sihlole ukusebenza kwezinhlobo ezahlukene zokwenziwa kwe-serialization sisebenzisa isibonelo sokwenziwa - hlela izinombolo ezihlukene zokhiye:

WITH dict AS (
  SELECT
    hstore(
      array_agg(i::text)
    , array_agg(i::text)
    )
  FROM
    generate_series(1, ...) i
)
TABLE dict;

Isikripthi sokuhlola: ukwenziwa kwe-serial

WITH T AS (
  SELECT
    *
  , (
      SELECT
        regexp_replace(ea[array_length(ea, 1)], '^Execution Time: (d+.d+) ms$', '1')::real et
      FROM
        (
          SELECT
            array_agg(el) ea
          FROM
            dblink('port= ' || current_setting('port') || ' dbname=' || current_database(), $$
              explain analyze
              WITH dict AS (
                SELECT
                  hstore(
                    array_agg(i::text)
                  , array_agg(i::text)
                  )
                FROM
                  generate_series(1, $$ || (1 << v) || $$) i
              )
              TABLE dict
            $$) T(el text)
        ) T
    ) et
  FROM
    generate_series(0, 19) v
  ,   LATERAL generate_series(1, 7) i
  ORDER BY
    1, 2
)
SELECT
  v
, avg(et)::numeric(32,3)
FROM
  T
GROUP BY
  1
ORDER BY
  1;

Ama-Antipatterns e-PostgreSQL: ake sishaye i-JOIN esindayo ngesichazamazwi

Ku-PostgreSQL 11, kufika cishe kusayizi wesichazamazwi wokhiye abangu-2^12 ukwenziwa kwe-serial ku-json kuthatha isikhathi esincane. Kulokhu, okusebenza kahle kakhulu inhlanganisela yokuguqulwa kwe-json_object kanye nohlobo "lwangaphakathi". array_agg(i::text).

Manje ake sizame ukufunda ukubaluleka kokhiye ngamunye izikhathi ezingu-8 - ngemva kwakho konke, uma ungafinyeleli kusichazamazwi, kungani-ke sidingeka?

Umbhalo wokuhlola: ukufunda kusichazamazwi

WITH T AS (
  SELECT
    *
  , (
      SELECT
        regexp_replace(ea[array_length(ea, 1)], '^Execution Time: (d+.d+) ms$', '1')::real et
      FROM
        (
          SELECT
            array_agg(el) ea
          FROM
            dblink('port= ' || current_setting('port') || ' dbname=' || current_database(), $$
              explain analyze
              WITH dict AS (
                SELECT
                  json_object(
                    array_agg(i::text)
                  , array_agg(i::text)
                  )
                FROM
                  generate_series(1, $$ || (1 << v) || $$) i
              )
              SELECT
                (TABLE dict) -> (i % ($$ || (1 << v) || $$) + 1)::text
              FROM
                generate_series(1, $$ || (1 << (v + 3)) || $$) i
            $$) T(el text)
        ) T
    ) et
  FROM
    generate_series(0, 19) v
  , LATERAL generate_series(1, 7) i
  ORDER BY
    1, 2
)
SELECT
  v
, avg(et)::numeric(32,3)
FROM
  T
GROUP BY
  1
ORDER BY
  1;

Ama-Antipatterns e-PostgreSQL: ake sishaye i-JOIN esindayo ngesichazamazwi

Futhi... kakade cishe ngokhiye abangu-2^6, ukufunda kusichazamazwi se-json kuqala ukulahlekelwa izikhathi eziningi ukufunda kusuka ku-hstore, ngoba jsonb okufanayo kwenzeka ku-2^9.

Iziphetho zokugcina:

  • uma udinga ukukwenza JOYINA namarekhodi amaningi aphindayo - kungcono ukusebenzisa "isichazamazwi" setafula
  • uma isichazamazwi sakho silindelwe encane futhi ngeke ufunde okuningi kuyo - ungasebenzisa i-json[b]
  • kuzo zonke ezinye izimo I-hstore + array_agg(i::umbhalo) izosebenza kangcono

Source: www.habr.com

Engeza amazwana