I-DBA: ukuhlela kahle ukuvumelanisa nokungenisa

Ngokucubungula okuyinkimbinkimbi kwamasethi amakhulu wedatha (okuhlukile Izinqubo ze-ETL: ukungenisa, ukuguqulwa nokuvumelanisa nomthombo wangaphandle) ngokuvamile kuba nesidingo "khumbula" okwesikhashana futhi ngokushesha usebenze into voluminous.

Umsebenzi ojwayelekile walolu hlobo ngokuvamile uzwakala kanje: "Lapho ke umnyango wezokubalwa kwezimali ulayishwe ebhange lekhasimende izinkokhelo zokugcina ezitholiwe, udinga ukuzilayisha ngokushesha kuwebhusayithi futhi uzixhume kuma-akhawunti akho"

Kodwa lapho ivolumu yale "nto" iqala ukulinganisa ngamakhulu ama-megabytes, futhi isevisi kufanele iqhubeke nokusebenza ne-database 24x7, kuvela imiphumela eminingi engemihle ezolimaza impilo yakho.
I-DBA: ukuhlela kahle ukuvumelanisa nokungenisa
Ukubhekana nazo ku-PostgreSQL (hhayi kuyo kuphela), ungasebenzisa ukulungiselelwa okuthile okuzokuvumela ukuthi ucubungule yonke into ngokushesha nangokusetshenziswa okuncane kwezinsiza.

1. Kuzothunyelwa kuphi?

Okokuqala, ake sinqume ukuthi singayilayisha kuphi idatha esifuna “ukuyicubungula.”

1.1. Amathebula esikhashana (TEMPORARY TABLE)

Empeleni, kumathebula esikhashana e-PostgreSQL ayafana nanoma yimaphi amanye. Ngakho-ke, izinkolelo-ze ziyathanda “Yonke into ekhona igcinwa enkumbulweni kuphela, futhi ingaphela”. Kodwa futhi kukhona umehluko omkhulu obalulekile.

Eyakho "indawo yegama" ekuxhumekeni ngakunye kusizindalwazi

Uma ukuxhumana okubili zama ukuxhuma ngesikhathi esisodwa CREATE TABLE x, khona-ke othile uzothola nakanjani iphutha elingeyona into ehlukile izinto zesizindalwazi.

Kodwa uma bobabili bezama ukwenza CREATE TEMPORARY TABLE x, khona-ke bobabili bazokwenza ngokujwayelekile, futhi wonke umuntu uzothola ikhophi yakho amatafula. Futhi ngeke kube khona okufanayo phakathi kwabo.

"Ukuzilimaza" lapho unqamula

Uma uxhumano luvaliwe, wonke amatafula esikhashana asuswa ngokuzenzakalelayo, ngakho-ke ngesandla DROP TABLE x akukho lutho ngaphandle...

Uma usebenza pgbouncer kumodi yokwenziwayo, khona-ke i-database iyaqhubeka nokukholelwa ukuthi lokhu kuxhumana kusasebenza, futhi kulo leli thebula lesikhashana lisekhona.

Ngakho-ke, ukuzama ukuyidala futhi, ukusuka kokunye ukuxhumana kuya ku-pgbouncer, kuzoholela ephutheni. Kodwa lokhu kungagwenywa ngokusebenzisa CREATE TEMPORARY TABLE IF NOT EXISTS x.

Yiqiniso, kungcono ukungakwenzi lokhu noma kunjalo, ngoba khona-ke ungakwazi "ngokuzumayo" ukuthola lapho idatha esele "kumnikazi wangaphambili". Kunalokho, kungcono kakhulu ukufunda imanuwali futhi ubone ukuthi uma udala itafula kungenzeka ukwengeza ON COMMIT DROP - okungukuthi, lapho ukuthengiselana kuqeda, itafula lizosuswa ngokuzenzakalelayo.

Ukungaphindaphindi

Ngenxa yokuthi aboxhumano oluthile kuphela, amathebula esikhashana awaphindwaphindwa. Kodwa lokhu kuqeda isidingo sokuqoshwa kabili kwedatha kunqwaba + WAL, ngakho-ke FAKA/BUYEKEZA/SUSA kuyo kushesha kakhulu.

Kodwa njengoba ithebula lesikhashana liseyithebula "elicishe lijwayelekile", alikwazi ukudalwa nakulomfanekiso. Okungenani okwamanje, nakuba isiqephu esihambisanayo sekuyisikhathi eside sijikeleza.

1.2. ITAFULA ENGAKUNGULIWE

Kodwa yini okufanele uyenze, isibonelo, uma unohlobo oluthile lwenqubo ye-ETL enzima engakwazi ukuqaliswa phakathi kokuthengiselana okukodwa, kodwa usenayo pgbouncer kumodi yokwenziwayo? ..

Noma ukugeleza kwedatha kukhulu kangangokuthi Awukho umkhawulokudonsa owanele ekuxhumekeni okukodwa kusuka kusizindalwazi (funda, inqubo eyodwa nge-CPU ngayinye)?..

Noma eminye imisebenzi iyaqhubeka ngokulinganayo ekuxhumaneni okuhlukene?..

Kunenketho eyodwa kuphela lapha - dala okwesikhashana ithebula elingelona elesikhashana. Pun, yebo. Leyo:

  • ngidale amathebula "ami" anamagama angahleliwe kakhulu ukuze ngingahlangani nanoma ubani
  • extract: uwagcwalise ngedatha evela emthonjeni wangaphandle
  • Guqula: iguquliwe, igcwaliswe ngezinkambu eziyinhloko zokuxhumanisa
  • Layisha: uthele idatha elungile kumathebula okuqondiwe
  • amatafula "ami" asusiwe

Futhi manje - impukane emafutheni. Empeleni, konke kubhala ku-PostgreSQL kwenzeka kabili - okokuqala ku-WAL, bese ingena emizimbeni yethebula/inkomba. Konke lokhu kwenzelwa ukusekela i-ACID nokulungisa ukubonakala kwedatha phakathi COMMIT'nathi futhi ROLLBACK'ukuthenga okungenalutho.

Kodwa asikudingi lokhu! Sinayo yonke inqubo Kuphakathi kokuthi iphumelele ngokuphelele noma ayiphumelelanga.. Akunandaba ukuthi zingaki izinkokhelo eziphakathi okuzoba khona - asinantshisekelo "ekuqhubekiseni inqubo kusukela phakathi," ikakhulukazi uma kungacaci ukuthi ibikuphi.

Ukwenza lokhu, abathuthukisi be-PostgreSQL, emuva kunguqulo 9.1, bethule into enjengokuthi Amathebula AKUNGALOYIWE:

Ngalesi sibonakaliso, ithebula lidalwe njengengabhalisiwe. Idatha ebhalelwe amathebula angabhalisiwe ayidluli kulogi yokubhala kusengaphambili (bona iSahluko 29), okwenza amathebula anjalo ukuthi sebenza ngokushesha kakhulu kunokujwayelekile. Nokho, awagonyiwe ekuhlulekeni; uma iseva yehluleka noma ukuvalwa kwesimo esiphuthumayo, ithebula elingabhalisiwe kuncishiswe ngokuzenzakalelayo. Ukwengeza, okuqukethwe kwethebula elingabhalisiwe akuphindaphindwa kumaseva ezigqila. Noma yiziphi izinkomba ezidalwe kuthebula elingabhalisiwe ziyasuswa ngokuzenzakalelayo.

Kafushane nje, izoshesha kakhulu, kodwa uma iseva yedatha "ewa", ngeke kube mnandi. Kodwa lokhu kwenzeka kangaki, futhi ingabe inqubo yakho ye-ETL iyakwazi ukulungisa lokhu ngendlela efanele “kusuka phakathi” ngemva “kokuvuselela” isizindalwazi?..

Uma kungenjalo, futhi icala elingenhla lifana nelakho, sebenzisa UNLOGGEDkodwa ungalokothi unganikezi lesi sici kumathebula wangempela, idatha oyithandayo kuwe.

1.3. OKUZINIKELA { SUSA IMIGWA | YEHLA}

Lokhu kwakhiwa kukuvumela ukuthi ucacise ukuziphatha okuzenzakalelayo lapho umsebenzi usuqediwe uma udala ithebula.

Mayelana ON COMMIT DROP Sengike ngabhala ngenhla, iyakhiqiza DROP TABLE, kodwa nge ON COMMIT DELETE ROWS isimo sithakazelisa kakhulu - sikhiqizwa lapha TRUNCATE TABLE.

Njengoba yonke ingqalasizinda yokugcina incazelo ye-meta yetafula lesikhashana ifana ncamashi naleyo yetafula elivamile, khona-ke Ukudala njalo nokususwa kwamatafula esikhashana kuholela "ekuvuvukeni" okunzima kwamatafula esistimu pg_class, pg_attribute, pg_attrdef, pg_depend,…

Manje cabanga ukuthi unomsebenzi ekuxhumekeni okuqondile ku-database, evula ukuthengiselana okusha njalo ngomzuzwana, idala, igcwalise, icubungule futhi isuse itafula lesikhashana ... Kuzoba nodoti oweqile oqoqwe kumatafula wesistimu, futhi lokhu kuzodala amabhuleki engeziwe ekusebenzeni ngakunye.

Ngokuvamile, ungakwenzi lokhu! Kulokhu kuphumelela kakhulu CREATE TEMPORARY TABLE x ... ON COMMIT DELETE ROWS ikhiphe emjikelezweni wokwenziwayo - lapho-ke ekuqaleni komsebenzi omusha ngamunye amathebula asevele akhona izoba khona (gcina ucingo CREATE), kepha izobe ingenalutho, sibonga u TRUNCATE (siphinde salondoloza ucingo lwayo) uma siqedela okwenziwa ngaphambilini.

1.4. THANDA...KUHLANGANISA...

Ngishilo ekuqaleni ukuthi enye yezindlela ezijwayelekile zokusebenzisa amatafula esikhashana yizinhlobo ezahlukahlukene zokungenisa - futhi unjiniyela ngokukhathala ukopisha-unamathisele uhlu lwezinkambu zetafula eliqondiwe esimemezelweni sakhe sesikhashana...

Kodwa ubuvila buyinjini yenqubekelaphambili! Yingakho ke dala ithebula elisha "ngokusekelwe kusampula" kungaba lula kakhulu:

CREATE TEMPORARY TABLE import_table(
  LIKE target_table
);

Njengoba ungakwazi ukukhiqiza idatha eningi kuleli thebula, ukusesha ngeke kusheshe. Kodwa kukhona ikhambi lendabuko kulokhu - izinkomba! Futhi, yebo, ithebula lesikhashana lingaba nezinkomba.

Njengoba, ngokuvamile, izinkomba ezidingekayo zihambisana nezinkomba zetafula eliqondiwe, ungavele ubhale LIKE target_table INCLUDING INDEXES.

Uma nawe udinga DEFAULT-values ​​(ngokwesibonelo, ukugcwalisa amanani abalulekile), ungasebenzisa LIKE target_table INCLUDING DEFAULTS. Noma kalula - LIKE target_table INCLUDING ALL - amakhophi okuzenzakalelayo, izinkomba, imikhawulo,...

Kodwa lapha udinga ukuqonda ukuthi uma udale ngenisa ithebula ngokushesha ngezinkomba, khona-ke idatha izothatha isikhathi eside ukulayishwakunokuba uma ugcwalisa yonke into kuqala, bese ugoqa izinkomba - bheka ukuthi ukwenza kanjani lokhu njengesibonelo pg_lahla.

Ngamafuphi, RTFM!

2. Ibhalwa kanjani?

Ake ngithi nje - sebenzisa COPY-geleza esikhundleni sokuthi "iphakethe" INSERT, ukusheshisa ngezinye izikhathi. Ungakwazi ngisho ngokuqondile nefayela elakhiwe ngaphambilini.

3. Icutshungulwa kanjani?

Ngakho-ke, ake sivumele isingeniso sethu sibukeke kanje:

  • unetafula elinedatha yeklayenti egcinwe kusizindalwazi sakho 1M amarekhodi
  • nsuku zonke iklayenti likuthumelela entsha "isithombe" esigcwele
  • ngokuhlangenwe nakho uyakwazi lokho ngezikhathi ezithile akukho amarekhodi angaphezu kuka-10K ashintshiwe

Isibonelo esiyingqayizivele sesimo esinjalo Isisekelo se-KLADR - kunamakheli amaningi esewonke, kodwa ekulayisheni kwamasonto onke kunezinguquko ezimbalwa kakhulu (ukuqamba kabusha izindawo zokuhlala, ukuhlanganisa imigwaqo, ukubukeka kwezindlu ezintsha) ngisho nasezingeni likazwelonke.

3.1. I-algorithm yokuvumelanisa egcwele

Ukwenza kube lula, ake sithi awudingi nokuhlela kabusha idatha - vele ulethe ithebula efomini oyifunayo, okungukuthi:

  • susa konke okungasekho
  • buyekeza yonke into ebivele ikhona futhi idinga ukubuyekezwa
  • faka konke okungakenzeki

Kungani imisebenzi kufanele yenziwe ngale ndlela? Ngoba le yindlela usayizi wetafula ozokhula ngayo kancane (khumbula ngeMVCC!).

SUSA KU-dst

Cha, vele ungaphumelela ngemisebenzi emibili nje:

  • susa (DELETE) konke ngokujwayelekile
  • faka konke kusuka esithombeni esisha

Kodwa ngesikhathi esifanayo, sibonga i-MVCC, Ubukhulu betafula buzokhula ngokuphindwe kabili! Ukuthola +1M izithombe zamarekhodi kuthebula ngenxa yesibuyekezo esingu-10K akudingekile kakhulu...

TRUNCATE dst

Unjiniyela onolwazi olunzulu uyazi ukuthi yonke ithebhulethi ingahlanzwa ngentengo ephansi:

  • kucace (TRUNCATE) itafula lonke
  • faka konke kusuka esithombeni esisha

Indlela iyasebenza, ngezinye izikhathi ziyasebenza ngempela, kodwa kunenkinga... Sizokwengeza amarekhodi angu-1M isikhathi eside, ngakho-ke ngeke sikwazi ukushiya ithebula lingenalutho kuso sonke lesi sikhathi (njengoba kuzokwenzeka ngaphandle kokuligoqa ngokwenziwe okukodwa).

Okuchaza:

  • siyaqala okwenziwa isikhathi eside
  • TRUNCATE kubeka Finyelela Okukhethekile-ukuvimba
  • senza ukufakwa isikhathi eside, futhi wonke umuntu ngalesi sikhathi angikwazi ngisho SELECT

Kukhona okungahambi kahle...

ALTER TABLE... GAMA KABUSHA... / DRIP TABLE...

Okunye ukugcwalisa yonke into etafuleni elisha elihlukile, bese uvele uliqambe kabusha esikhundleni saleli elidala. Izinto ezimbalwa ezimbi ezimbi:

  • namanje futhi Finyelela Okukhethekile, nakuba isikhathi esincane kakhulu
  • zonke izinhlelo zemibuzo/izibalo zaleli thebula zisethwa kabusha, kudingeka ugijime HLAZIYA
  • zonke izihluthulelo zakwamanye amazwe ziphukile (FK) etafuleni

Kube nesiqephu se-WIP esivela ku-Simon Riggs esiphakamise ukwenziwa ALTER-umsebenzi wokushintsha indikimba yetafula ezingeni lefayela, ngaphandle kokuthinta izibalo kanye ne-FK, kodwa ayizange iqoqe ikhoramu.

SUSA, BUYEKEZA, FAKA

Ngakho-ke, sixazulula inketho yokungavimbeli yemisebenzi emithathu. Cishe ezintathu... Ungakwenza kanjani lokhu ngempumelelo kakhulu?

-- все делаем в рамках транзакции, чтобы никто не видел "промежуточных" состояний
BEGIN;

-- создаем временную таблицу с импортируемыми данными
CREATE TEMPORARY TABLE tmp(
  LIKE dst INCLUDING INDEXES -- по образу и подобию, вместе с индексами
) ON COMMIT DROP; -- за рамками транзакции она нам не нужна

-- быстро-быстро вливаем новый образ через COPY
COPY tmp FROM STDIN;
-- ...
-- .

-- удаляем отсутствующие
DELETE FROM
  dst D
USING
  dst X
LEFT JOIN
  tmp Y
    USING(pk1, pk2) -- поля первичного ключа
WHERE
  (D.pk1, D.pk2) = (X.pk1, X.pk2) AND
  Y IS NOT DISTINCT FROM NULL; -- "антиджойн"

-- обновляем оставшиеся
UPDATE
  dst D
SET
  (f1, f2, f3) = (T.f1, T.f2, T.f3)
FROM
  tmp T
WHERE
  (D.pk1, D.pk2) = (T.pk1, T.pk2) AND
  (D.f1, D.f2, D.f3) IS DISTINCT FROM (T.f1, T.f2, T.f3); -- незачем обновлять совпадающие

-- вставляем отсутствующие
INSERT INTO
  dst
SELECT
  T.*
FROM
  tmp T
LEFT JOIN
  dst D
    USING(pk1, pk2)
WHERE
  D IS NOT DISTINCT FROM NULL;

COMMIT;

3.2. Ngenisa ngemuva kokucubungula

Ku-KLADR efanayo, wonke amarekhodi ashintshiwe kufanele aphinde aqhutshwe ngokucutshungulwa kwangemuva - ajwayeleke, amagama angukhiye agqanyiswe, futhi ehliselwe ezakhiweni ezidingekayo. Kodwa wazi kanjani - yini ngempela eshintshilengaphandle kokufaka inkimbinkimbi ikhodi yokuvumelanisa, ngaphandle kokuyithinta nhlobo?

Uma kuphela inqubo yakho inokufinyelela kokubhala ngesikhathi sokuvumelanisa, ungasebenzisa isicuphi esizosiqoqela zonke izinguquko:

-- целевые таблицы
CREATE TABLE kladr(...);
CREATE TABLE kladr_house(...);

-- таблицы с историей изменений
CREATE TABLE kladr$log(
  ro kladr, -- тут лежат целые образы записей старой/новой
  rn kladr
);

CREATE TABLE kladr_house$log(
  ro kladr_house,
  rn kladr_house
);

-- общая функция логирования изменений
CREATE OR REPLACE FUNCTION diff$log() RETURNS trigger AS $$
DECLARE
  dst varchar = TG_TABLE_NAME || '$log';
  stmt text = '';
BEGIN
  -- проверяем необходимость логгирования при обновлении записи
  IF TG_OP = 'UPDATE' THEN
    IF NEW IS NOT DISTINCT FROM OLD THEN
      RETURN NEW;
    END IF;
  END IF;
  -- создаем запись лога
  stmt = 'INSERT INTO ' || dst::text || '(ro,rn)VALUES(';
  CASE TG_OP
    WHEN 'INSERT' THEN
      EXECUTE stmt || 'NULL,$1)' USING NEW;
    WHEN 'UPDATE' THEN
      EXECUTE stmt || '$1,$2)' USING OLD, NEW;
    WHEN 'DELETE' THEN
      EXECUTE stmt || '$1,NULL)' USING OLD;
  END CASE;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Manje singasebenzisa izibangeli ngaphambi kokuqala ukuvumelanisa (noma sikwazi ukusebenzisa ALTER TABLE ... ENABLE TRIGGER ...):

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr_house
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

Bese sikhipha ngokuthula zonke izinguquko esizidingayo kumatafula elogi futhi sizisebenzise ngokusebenzisa izibambi ezengeziwe.

3.3. Ingenisa Amasethi Axhunyiwe

Ngenhla sicabangele izimo lapho izakhiwo zedatha yomthombo nendawo okuyiwa kuyo zifana. Kodwa kuthiwani uma ukulayishwa okuvela ohlelweni lwangaphandle kunefomethi ehlukile kunesakhiwo sesitoreji kusizindalwazi sethu?

Ake sithathe njengesibonelo ukugcinwa kwamaklayenti nama-akhawunti awo, inketho yakudala "yabaningi kuya kokukodwa":

CREATE TABLE client(
  client_id
    serial
      PRIMARY KEY
, inn
    varchar
      UNIQUE
, name
    varchar
);

CREATE TABLE invoice(
  invoice_id
    serial
      PRIMARY KEY
, client_id
    integer
      REFERENCES client(client_id)
, number
    varchar
, dt
    date
, sum
    numeric(32,2)
);

Kodwa ukulanda okuvela emthonjeni wangaphandle kuza kithi ngendlela yokuthi "konke kokukodwa":

CREATE TEMPORARY TABLE invoice_import(
  client_inn
    varchar
, client_name
    varchar
, invoice_number
    varchar
, invoice_dt
    date
, invoice_sum
    numeric(32,2)
);

Ngokusobala, idatha yekhasimende ingaphinda iphindwe kule nguqulo, futhi irekhodi eliyinhloko "i-akhawunti":

0123456789;Вася;A-01;2020-03-16;1000.00
9876543210;Петя;A-02;2020-03-16;666.00
0123456789;Вася;B-03;2020-03-16;9999.00

Kumodeli, sizomane sifake idatha yethu yokuhlola, kodwa khumbula - COPY esebenza kahle kakhulu!

INSERT INTO invoice_import
VALUES
  ('0123456789', 'Вася', 'A-01', '2020-03-16', 1000.00)
, ('9876543210', 'Петя', 'A-02', '2020-03-16', 666.00)
, ('0123456789', 'Вася', 'B-03', '2020-03-16', 9999.00);

Okokuqala, ake sigqamise lokho “kusikeka” lapho “amaqiniso” ethu abhekisela khona. Esimweni sethu, ama-invoyisi abhekisa kumakhasimende:

CREATE TEMPORARY TABLE client_import AS
SELECT DISTINCT ON(client_inn)
-- можно просто SELECT DISTINCT, если данные заведомо непротиворечивы
  client_inn inn
, client_name "name"
FROM
  invoice_import;

Ukuze sihlobanise kahle ama-akhawunti nama-ID ekhasimende, sidinga kuqala ukuthola noma ukukhiqiza lezi zihlonzi. Masingeze izinkambu ngaphansi kwazo:

ALTER TABLE invoice_import ADD COLUMN client_id integer;
ALTER TABLE client_import ADD COLUMN client_id integer;

Masisebenzise indlela yokuvumelanisa yethebula echazwe ngenhla nesichibiyelo esincane - ngeke sibuyekeze noma sisuse lutho kuthebula eliqondiwe, ngoba singenisa amakhasimende "i-append-kuphela":

-- проставляем в таблице импорта ID уже существующих записей
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  client D
WHERE
  T.inn = D.inn; -- unique key

-- вставляем отсутствовавшие записи и проставляем их ID
WITH ins AS (
  INSERT INTO client(
    inn
  , name
  )
  SELECT
    inn
  , name
  FROM
    client_import
  WHERE
    client_id IS NULL -- если ID не проставился
  RETURNING *
)
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  ins D
WHERE
  T.inn = D.inn; -- unique key

-- проставляем ID клиентов у записей счетов
UPDATE
  invoice_import T
SET
  client_id = D.client_id
FROM
  client_import D
WHERE
  T.client_inn = D.inn; -- прикладной ключ

Eqinisweni, konke kungaphakathi invoice_import Manje sesigcwalise inkambu yokuxhumana client_id, esizofaka ngayo i-invoyisi.

Source: www.habr.com

Engeza amazwana