I-DBA: ukulungelelanisa ngokufanelekileyo ulungelelwaniso kunye nokungenisa elizweni

Ukusetyenzwa okuntsonkothileyo kweeseti ezinkulu zedatha (ezahlukeneyo Iinkqubo ze-ETL: ukungenisa, ukuguqulwa kunye nongqamaniso kunye nomthombo wangaphandle) rhoqo kukho imfuneko okwethutyana "khumbula" kwaye ngokukhawuleza usebenze ngokukhawuleza into voluminous.

Umsebenzi oqhelekileyo wolu hlobo udla ngokuvakala ngolu hlobo: "Apha isebe le-accounting elothulwe kwibhanki yabathengi Iintlawulo zokugqibela ezifunyenweyo, kufuneka uzifake ngokukhawuleza kwiwebhusayithi kwaye uzinxibelelanise neeakhawunti zakho.

Kodwa xa umthamo wale "nto" iqala ukulinganisa kumakhulu ama-megabytes, kwaye inkonzo kufuneka iqhubeke nokusebenza kunye ne-database 24x7, iziphumo ezininzi zecala zivela eziza konakalisa ubomi bakho.
I-DBA: ukulungelelanisa ngokufanelekileyo ulungelelwaniso kunye nokungenisa elizweni
Ukujongana nabo kwi-PostgreSQL (kwaye kungekhona kuyo kuphela), ungasebenzisa ezinye izinto eziya kukuvumela ukuba uqhube yonke into ngokukhawuleza kunye nokusetyenziswa okuncinci kwezixhobo.

1. Ukuthunyelwa phi?

Okokuqala, makhe sithathe isigqibo sokuba singayilayisha phi na idatha esifuna “ukuyiqhuba.”

1.1. Iitafile zethutyana (TEMPORARY TABLE)

Ngokomgaqo, kwiPostgreSQL iitafile zethutyana ziyafana nayo nayiphi na enye. Ngoko ke, iinkolelo ezifana "Yonke into ekhoyo igcinwa kuphela kwinkumbulo, kwaye inokuphela". Kodwa kukho iiyantlukwano ezininzi ezibalulekileyo.

Eyakho "isithuba segama" kuqhagamshelwano ngalunye kwisiseko sedatha

Ukuba amakhonkco amabini azama ukudibanisa ngexesha elinye CREATE TABLE x, ngoko umntu ngokuqinisekileyo uya kufumana impazamo engeyiyo yodwa izinto zesiseko sedata.

Kodwa ukuba bobabini bazama ukuphumeza CREATE TEMPORARY TABLE x, ngoko bobabini baya kuyenza ngokuqhelekileyo, kwaye wonke umntu uya kufumana ikopi yakho iitafile. Kwaye akuyi kubakho nto ifana phakathi kwabo.

"Ukuzitshabalalisa" xa uqhawula

Xa uqhagamshelo luvaliwe, zonke iitafile zexeshana zicinywa ngokuzenzekelayo, ngoko ke ngesandla DROP TABLE x akukho nto ngaphandle...

Ukuba usebenza nge pgbouncer kwimo yentengiselwano, ngoko i-database iyaqhubeka ukukholelwa ukuba olu xhulumaniso lusasebenza, kwaye kuyo le tafile yethutyana isekho.

Ke ngoko, ukuzama ukuyidala kwakhona, ukusuka kunxibelelwano olwahlukileyo ukuya kwi-pgbouncer, kuya kubangela impazamo. Kodwa oku kunokuphetshwa ngokusebenzisa CREATE TEMPORARY TABLE IF NOT EXISTS x.

Enyanisweni, kungcono ukuba ungakwenzi oku, kuba ngoko unako "ngokukhawuleza" ukufumana idatha eseleyo "kumnini wangaphambili". Endaweni yoko, kungcono kakhulu ukufunda incwadana kwaye ubone ukuba xa udala itafile kunokwenzeka ukongeza ON COMMIT DROP - oko kukuthi, xa ukuthengiselana kugqityiwe, itafile iya kususwa ngokuzenzekelayo.

Ukungaphindaphindi

Ngenxa yokuba zezonxibelelwano oluthile kuphela, iitheyibhile zexeshana aziphindaphindwa. Kodwa oku kuphelisa imfuneko yokurekhodwa kabini kwedatha kwimfumba + i-WAL, ngoko ke FAKA/UHLAZIYWA/CIMA kuyo kukhawuleza kakhulu.

Kodwa ekubeni itheyibhile yethutyana iseyitafile "ephantse iqhelekile", ayinakwenziwa kwi-replica nokuba. Ubuncinci okwangoku, nangona isiqwenga esihambelanayo sele sijikeleza ixesha elide.

1.2. ITHEYIBHILE ENGONGAWANGA

Kodwa kufuneka wenze ntoni, umzekelo, ukuba unohlobo oluthile lwenkqubo enzima ye-ETL engenakuphunyezwa ngaphakathi kwentengiselwano enye, kodwa usenayo pgbouncer kwimo yentengiselwano? ..

Okanye ukuhamba kwedatha kukhulu kangangokuba Akukho bandwidth yaneleyo kuqhagamshelwano olunye ukusuka kwisiseko sedatha (funda, inkqubo enye nge-CPU)?..

Okanye eminye imisebenzi iyaqhubeka ngokungangqamanisi kwiindibano ezahlukeneyo?..

Inye kuphela inketho apha - yenza okwethutyana itafile engeyoyexeshana. Pun, ewe. Yiyo i:

  • ndenze "ezam" iitafile ezinamagama angaqhelekanga kakhulu ukuze ungadibanisi nabani na
  • Ukukhupha: zizalise ngedatha evela kumthombo wangaphandle
  • Guqula: iguquliwe, izaliswe kwiindawo eziphambili zokudibanisa
  • umthwalo: ugalele idatha esele ilungile kwiitafile ekujoliswe kuzo
  • iitafile "zam" icinyiwe

Kwaye ngoku - impukane kwi-ointment. Inyaniso, konke kubhala kwiPostgreSQL kwenzeka kabini - okokuqala kwi-WAL, emva koko kwitheyibhile/imizimba yesalathisi. Konke oku kwenziwa ukuxhasa i-ACID kunye nokulungisa ukubonakala kwedatha phakathi COMMIT'nutty kunye ROLLBACK'iintengiselwano ezingenanto.

Kodwa asiyifuni le nto! Sinayo yonke inkqubo Mhlawumbi yayiphumelele ngokupheleleyo okanye ayizange iphumelele.. Ayinamsebenzi nokuba zingaphi iintengiselwano eziphakathi eziya kubakho - asinamdla "ekuqhubekeni kwenkqubo ukusuka phakathi," ngakumbi xa kungacaci ukuba ibiphi.

Ukwenza oku, abaphuhlisi bePostgreSQL, babuyele kwinguqulo 9.1, bazise into efana nale Iitheyibhile EZINGAGCAWANGA:

Ngolu phawu, itafile yenziwe njengengabhalwanga. Idatha ebhalelwe iitheyibhile ezingabhalwanga ayidluli kushicilelo olubhalwe ngaphambili (jonga iSahluko 29), ibangela ukuba iitheyibhile ezinjalo zibe sebenza ngokukhawuleza kunesiqhelo. Noko ke, nazo ziyasilela; kwimeko yokusilela komncedisi okanye ukucima kukaxakeka, itafile engabhalwanga inqunyulwe ngokuzenzekelayo. Ukongeza, imixholo yetafile engabhalwanga ayiphindaphindwa kubancedisi bamakhoboka. Naziphi na izalathisi ezenziwe kwitafile engabhalwanga ngokuzenzekelayo ziye zivulwe.

Ngokufutshane, iya kukhawuleza kakhulu, kodwa ukuba iseva yedatha "iyawa", ayiyi kuba mnandi. Kodwa kwenzeka kangaphi oku, kwaye ngaba inkqubo yakho ye-ETL iyayazi indlela yokuyilungisa ngokuchanekileyo “ukusuka phakathi” emva “kokuvuselela” isiseko sedatha?..

Ukuba akunjalo, kwaye imeko engentla iyafana neyakho, sebenzisa UNLOGGED, kodwa zange Sukwenza olu phawu kwiitafile zokwenyani, idatha oyithandayo kuwe.

1.3. NGENXA YEZIBONELELO { CIMA IMIQEQO | LAHLA}

Olu lwakhiwo lukuvumela ukuba uchaze ukuziphatha okuzenzekelayo xa utshintshiselwano lugqityiwe xa usenza itafile.

phezu ON COMMIT DROP Sele ndibhale ngasentla, iyavelisa DROP TABLE, kodwa kunye ON COMMIT DELETE ROWS imeko inomdla ngakumbi - uveliswa apha TRUNCATE TABLE.

Kuba yonke isiseko sokugcina inkcazo yemeta yetafile yethutyana ifana ncam naleyo yetafile eqhelekileyo, ngoko. Ukudala rhoqo kunye nokususwa kweetafile zesikhashana kukhokelela "ekudumbeni" okunzima kweetafile zenkqubo pg_class, pg_attribute, pg_attrdef, pg_depend,...

Ngoku khawufane ucinge ukuba unomsebenzi ngokuqhagamshelwa ngokuthe ngqo kwisiseko sedatha, evula intengiselwano entsha nganye yesibini, idala, igcwalise, iqhube kwaye isuse itafile yesikhashana ... Kuya kubakho ubuninzi benkunkuma eqokelelwe kwiitafile zenkqubo, kwaye oku kuya kubangela iziqhoboshi ezongezelelweyo kumsebenzi ngamnye.

Ngokubanzi, musa ukwenza oku! Kule meko kusebenza ngakumbi CREATE TEMPORARY TABLE x ... ON COMMIT DELETE ROWS yikhuphe kumjikelo wentengiselwano - ngoko ekuqaleni kwentengiselwano entsha nganye iitafile sele sele iya kubakho (gcina umnxeba CREATE), kodwa iya kuba ingenanto, enkosi ku TRUNCATE (siphinde sagcina umnxeba wayo) xa sigqibezela intengiselwano yangaphambili.

1.4. THANDA...UKUHLANGANISA...

Ndikhankanyile ekuqaleni ukuba enye yeemeko zokusetyenziswa kwetafile zexeshana ziintlobo ezahlukeneyo zokungeniswa ngaphandle- kwaye umphuhlisi ngokudinwa ukopa-uncamathisela uluhlu lwemimandla yetafile ekujoliswe kuyo kwisibhengezo sexeshana lakhe...

Kodwa ubuvila yinjini yenkqubela phambili! Kunjalo ngoba yenza itafile entsha "ngokusekwe kwisampulu" inokuba lula ngakumbi:

CREATE TEMPORARY TABLE import_table(
  LIKE target_table
);

Kuba emva koko unokwenza idatha eninzi kule theyibhile, ukukhangela kuyo akusayi kukhawuleza. Kodwa kukho isisombululo semveli kule nto - izalathisi! Kwaye, ewe, itafile yethutyana ingaba nazo izalathi.

Kuba, rhoqo, izalathisi ezifunekayo zingqinelana nezalathisi zetafile ekujoliswe kuyo, unokubhala ngokulula LIKE target_table INCLUDING INDEXES.

Ukuba nawe uyafuna DEFAULT-amaxabiso (umzekelo, ukugcwalisa amaxabiso aphambili), ungasebenzisa LIKE target_table INCLUDING DEFAULTS. Okanye ngokulula- LIKE target_table INCLUDING ALL - iikopi ezisisiseko, izalathisi, imiqobo,...

Kodwa apha kufuneka uqonde ukuba ukuba udale ngenisa itheyibhile ngokukhawuleza ngezalathisi, emva koko idatha iyakuthatha ixesha elide ukulayishwakunokuba uqale ugcwalise yonke into, kwaye emva koko usonge izalathisi - jonga indlela ekwenza ngayo oku njengomzekelo pg_lahla.

Ngamafutshane, I-RTFM!

2. Ibhalwa njani?

Mandithi - sebenzisa COPY-hamba endaweni ye "pack" INSERT, ukukhawuleza ngamanye amaxesha. Unako nokuba ngqo kwifayile eyenziwe kwangaphambili.

3. Indlela yokwenza?

Ke, masivumele isingeniso sethu sijonge ngolu hlobo:

  • unetafile enedatha yomxhasi egcinwe kuvimba wakho weenkcukacha Iirekhodi ze-1M
  • yonke imihla umxhasi ukuthumelela entsha ngokupheleleyo "umfanekiso"
  • ngokusuka kumava uyazi ukuba amaxesha ngamaxesha akukho ngaphezulu kweerekhodi ze-10K ezitshintshiweyo

Umzekelo oqhelekileyo wemeko enjalo Isiseko se-KLADR - kukho iidilesi ezininzi ngokupheleleyo, kodwa kwiveki nganye yokulayishwa kukho utshintsho olumbalwa kakhulu (ukuthiwa ngokutsha kweendawo zokuhlala, ukudibanisa izitrato, ukubonakala kwezindlu ezintsha) nakwizinga likazwelonke.

3.1. I-algorithm yongqamaniso olupheleleyo

Ukwenza lula, masithi awufuni kwaukuhlengahlengisa idatha - yizisa nje itafile kwifomu oyifunayo, oko kukuthi:

  • ukususa yonke into engasekhoyo
  • hlaziya yonke into esele ikhona kwaye kufuneka ihlaziywe
  • faka yonke into engekenzeki

Kutheni le nto kufuneka kwenziwe imisebenzi ngolu hlobo? Kuba le yindlela ubungakanani betafile buya kukhula kancinci (khumbula iMVCC!).

CIMA KWI-dst

Hayi, ewe, ungadlula ngemisebenzi emibini nje:

  • ukususa (DELETE) yonke into ngokubanzi
  • faka konke kumfanekiso omtsha

Kodwa kwangaxeshanye, enkosi kwi-MVCC, Ubungakanani betafile buya kunyuka ngokuphindwe kabini! Ukufumana +1M imifanekiso yeerekhodi kwitheyibhile ngenxa yohlaziyo lwe-10K kukungafuneki ...

TRUNCATE dst

Umphuhlisi onamava ngakumbi uyazi ukuba yonke ithebhulethi inokucocwa ngexabiso eliphantsi:

  • icace (TRUNCATE) itafile yonke
  • faka konke kumfanekiso omtsha

Indlela iyasebenza, ngamanye amaxesha kufanelekile, kodwa kukho ingxaki ... Siza kongeza iirekhodi ze-1M ixesha elide, ngoko asinakukwazi ukushiya itafile ingenanto kulo lonke eli xesha (njengoko kuya kwenzeka ngaphandle kokuyigubungela kwintengiselwano enye).

Oko kukuthi:

  • siyaqala intengiselwano yexesha elide
  • TRUNCATE ibeka AccessExclusive-ukuthintela
  • senza ukufakwa ixesha elide, kwaye wonke umntu ngeli xesha andikwazi nokuba SELECT

Kukho into engahambi kakuhle...

ALTER TABLE... TSHINTSHA KWAKHO... / RIP THEABLE...

Enye indlela kukugcwalisa yonke into kwitafile entsha eyahlukileyo, kwaye uyinike igama kwakhona endaweni yendala. Zimbini izinto ezincinci ezimbi:

  • nangoku kwakhona AccessExclusive, nangona ixesha lincinci kakhulu
  • zonke izicwangciso zemibuzo/izibalo zale theyibhile zisetwa ngokutsha, kufuneka uqhube HLALUTYA
  • zonke izitshixo zasemzini zaphukile (FK) kwitafile

Kwakukho ipatch ye-WIP evela kuSimon Riggs ecebisa ukwenziwa ALTER-umsebenzi wokutshintsha umzimba wetafile kwinqanaba lefayile, ngaphandle kokuchukumisa izibalo kunye ne-FK, kodwa ayizange iqokelele ikhoram.

Cima, HLAZIYA, FAKA

Ke, sihlala kukhetho olungathinteliyo lwemisebenzi emithathu. Phantse abathathu... Uyenza njani le nto ngempumelelo?

-- все делаем в рамках транзакции, чтобы никто не видел "промежуточных" состояний
BEGIN;

-- создаем временную таблицу с импортируемыми данными
CREATE TEMPORARY TABLE tmp(
  LIKE dst INCLUDING INDEXES -- по образу и подобию, вместе с индексами
) ON COMMIT DROP; -- за рамками транзакции она нам не нужна

-- быстро-быстро вливаем новый образ через COPY
COPY tmp FROM STDIN;
-- ...
-- .

-- удаляем отсутствующие
DELETE FROM
  dst D
USING
  dst X
LEFT JOIN
  tmp Y
    USING(pk1, pk2) -- поля первичного ключа
WHERE
  (D.pk1, D.pk2) = (X.pk1, X.pk2) AND
  Y IS NOT DISTINCT FROM NULL; -- "антиджойн"

-- обновляем оставшиеся
UPDATE
  dst D
SET
  (f1, f2, f3) = (T.f1, T.f2, T.f3)
FROM
  tmp T
WHERE
  (D.pk1, D.pk2) = (T.pk1, T.pk2) AND
  (D.f1, D.f2, D.f3) IS DISTINCT FROM (T.f1, T.f2, T.f3); -- незачем обновлять совпадающие

-- вставляем отсутствующие
INSERT INTO
  dst
SELECT
  T.*
FROM
  tmp T
LEFT JOIN
  dst D
    USING(pk1, pk2)
WHERE
  D IS NOT DISTINCT FROM NULL;

COMMIT;

3.2. Ngenisa emva kokulungiswa

Kwi-KLADR efanayo, zonke iirekhodi ezitshintshiweyo kufuneka ziqhutywe ngokugqithiswa kwe-post-processing - eziqhelekileyo, amagama angundoqo agxininiswe, kwaye ancitshiswe kwizakhiwo ezifunekayo. Kodwa uyazi njani - yintoni kanye etshintshileyongaphandle kokwenza nzima ikhowudi yongqamaniso, ngokufanelekileyo ngaphandle kokuyichukumisa kwaphela?

Ukuba kuphela inkqubo yakho inofikelelo lokubhala ngexesha longqamaniso, ngoko ungasebenzisa inqaku eliza kuthi liqokelele lonke utshintsho kuthi:

-- целевые таблицы
CREATE TABLE kladr(...);
CREATE TABLE kladr_house(...);

-- таблицы с историей изменений
CREATE TABLE kladr$log(
  ro kladr, -- тут лежат целые образы записей старой/новой
  rn kladr
);

CREATE TABLE kladr_house$log(
  ro kladr_house,
  rn kladr_house
);

-- общая функция логирования изменений
CREATE OR REPLACE FUNCTION diff$log() RETURNS trigger AS $$
DECLARE
  dst varchar = TG_TABLE_NAME || '$log';
  stmt text = '';
BEGIN
  -- проверяем необходимость логгирования при обновлении записи
  IF TG_OP = 'UPDATE' THEN
    IF NEW IS NOT DISTINCT FROM OLD THEN
      RETURN NEW;
    END IF;
  END IF;
  -- создаем запись лога
  stmt = 'INSERT INTO ' || dst::text || '(ro,rn)VALUES(';
  CASE TG_OP
    WHEN 'INSERT' THEN
      EXECUTE stmt || 'NULL,$1)' USING NEW;
    WHEN 'UPDATE' THEN
      EXECUTE stmt || '$1,$2)' USING OLD, NEW;
    WHEN 'DELETE' THEN
      EXECUTE stmt || '$1,NULL)' USING OLD;
  END CASE;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Ngoku sinokufaka izichukumiso phambi kokuba siqale ungqamaniso (okanye sikwazi ukusebenzisa ALTER TABLE ... ENABLE TRIGGER ...):

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr_house
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

Kwaye emva koko sikhupha ngokuzolileyo lonke utshintsho esiludingayo kwiitafile zelog kwaye siziqhube ngokusebenzisa abaphathi abongezelelweyo.

3.3. Ukuthathwa ngaphandle kweSeti eziManyanisiweyo

Ngentla sithathele ingqalelo iimeko xa izakhiwo zedatha zomthombo kunye nendawo ekuyiwa kuyo ziyafana. Kodwa kuthekani ukuba ukulayishwa kwenkqubo yangaphandle kunefomathi eyahlukileyo kwisakhiwo sokugcina kwisiseko sethu sedatha?

Makhe sithathe njengomzekelo ukugcinwa kwabathengi kunye neeakhawunti zabo, ukhetho lwakudala "luninzi-kuya-enye":

CREATE TABLE client(
  client_id
    serial
      PRIMARY KEY
, inn
    varchar
      UNIQUE
, name
    varchar
);

CREATE TABLE invoice(
  invoice_id
    serial
      PRIMARY KEY
, client_id
    integer
      REFERENCES client(client_id)
, number
    varchar
, dt
    date
, sum
    numeric(32,2)
);

Kodwa ukukhuphela okuvela kumthombo wangaphandle kuza kuthi ngendlela "yonke into enye":

CREATE TEMPORARY TABLE invoice_import(
  client_inn
    varchar
, client_name
    varchar
, invoice_number
    varchar
, invoice_dt
    date
, invoice_sum
    numeric(32,2)
);

Ngokucacileyo, idatha yomthengi inokuphinda iphindwe kule nguqulo, kwaye irekhodi eliphambili "yiakhawunti":

0123456789;Вася;A-01;2020-03-16;1000.00
9876543210;Петя;A-02;2020-03-16;666.00
0123456789;Вася;B-03;2020-03-16;9999.00

Kwimodeli, siza kufaka idatha yethu yovavanyo, kodwa khumbula - COPY esebenza ngakumbi!

INSERT INTO invoice_import
VALUES
  ('0123456789', 'Вася', 'A-01', '2020-03-16', 1000.00)
, ('9876543210', 'Петя', 'A-02', '2020-03-16', 666.00)
, ('0123456789', 'Вася', 'B-03', '2020-03-16', 9999.00);

Okokuqala, makhe siqaqambise ezo “sikelelo” apho “iinyani” zethu zibhekisa khona. Kwimeko yethu, ii-invoyisi zibhekisa kubathengi:

CREATE TEMPORARY TABLE client_import AS
SELECT DISTINCT ON(client_inn)
-- можно просто SELECT DISTINCT, если данные заведомо непротиворечивы
  client_inn inn
, client_name "name"
FROM
  invoice_import;

Ukuze sinxulumane ngokuchanekileyo ii-akhawunti kunye nee-ID zabathengi, kufuneka siqale sifumane okanye sivelise ezi zichongi. Makhe songeze iindawo eziphantsi kwazo:

ALTER TABLE invoice_import ADD COLUMN client_id integer;
ALTER TABLE client_import ADD COLUMN client_id integer;

Masisebenzise indlela yongqamaniso yetafile echazwe ngasentla ngesilungiso esincinci - asiyi kuhlaziya okanye sicime nantoni na kwitafile ekujoliswe kuyo, kuba singenisa abathengi "i-append-kuphela":

-- проставляем в таблице импорта ID уже существующих записей
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  client D
WHERE
  T.inn = D.inn; -- unique key

-- вставляем отсутствовавшие записи и проставляем их ID
WITH ins AS (
  INSERT INTO client(
    inn
  , name
  )
  SELECT
    inn
  , name
  FROM
    client_import
  WHERE
    client_id IS NULL -- если ID не проставился
  RETURNING *
)
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  ins D
WHERE
  T.inn = D.inn; -- unique key

-- проставляем ID клиентов у записей счетов
UPDATE
  invoice_import T
SET
  client_id = D.client_id
FROM
  client_import D
WHERE
  T.client_inn = D.inn; -- прикладной ключ

Enyanisweni, yonke into ilungile invoice_import Ngoku sinendawo yoqhagamshelwano ezaliswe kuyo client_id, esiza kufaka ngayo i-invoyisi.

umthombo: www.habr.com

Yongeza izimvo