Ukusetyenzwa okuntsonkothileyo kweeseti ezinkulu zedatha (ezahlukeneyo
Umsebenzi oqhelekileyo wolu hlobo udla ngokuvakala ngolu hlobo: "Apha
Kodwa xa umthamo wale "nto" iqala ukulinganisa kumakhulu ama-megabytes, kwaye inkonzo kufuneka iqhubeke nokusebenza kunye ne-database 24x7, iziphumo ezininzi zecala zivela eziza konakalisa ubomi bakho.
Ukujongana nabo kwi-PostgreSQL (kwaye kungekhona kuyo kuphela), ungasebenzisa ezinye izinto eziya kukuvumela ukuba uqhube yonke into ngokukhawuleza kunye nokusetyenziswa okuncinci kwezixhobo.
1. Ukuthunyelwa phi?
Okokuqala, makhe sithathe isigqibo sokuba singayilayisha phi na idatha esifuna “ukuyiqhuba.”
1.1. Iitafile zethutyana (TEMPORARY TABLE)
Ngokomgaqo, kwiPostgreSQL iitafile zethutyana ziyafana nayo nayiphi na enye. Ngoko ke, iinkolelo ezifana "Yonke into ekhoyo igcinwa kuphela kwinkumbulo, kwaye inokuphela". Kodwa kukho iiyantlukwano ezininzi ezibalulekileyo.
Eyakho "isithuba segama" kuqhagamshelwano ngalunye kwisiseko sedatha
Ukuba amakhonkco amabini azama ukudibanisa ngexesha elinye CREATE TABLE x
, ngoko umntu ngokuqinisekileyo uya kufumana impazamo engeyiyo yodwa izinto zesiseko sedata.
Kodwa ukuba bobabini bazama ukuphumeza CREATE TEMPORARY TABLE x
, ngoko bobabini baya kuyenza ngokuqhelekileyo, kwaye wonke umntu uya kufumana ikopi yakho iitafile. Kwaye akuyi kubakho nto ifana phakathi kwabo.
"Ukuzitshabalalisa" xa uqhawula
Xa uqhagamshelo luvaliwe, zonke iitafile zexeshana zicinywa ngokuzenzekelayo, ngoko ke ngesandla DROP TABLE x
akukho nto ngaphandle...
Ukuba usebenza nge pgbouncer kwimo yentengiselwano, ngoko i-database iyaqhubeka ukukholelwa ukuba olu xhulumaniso lusasebenza, kwaye kuyo le tafile yethutyana isekho.
Ke ngoko, ukuzama ukuyidala kwakhona, ukusuka kunxibelelwano olwahlukileyo ukuya kwi-pgbouncer, kuya kubangela impazamo. Kodwa oku kunokuphetshwa ngokusebenzisa CREATE TEMPORARY TABLE IF NOT EXISTS x
.
Enyanisweni, kungcono ukuba ungakwenzi oku, kuba ngoko unako "ngokukhawuleza" ukufumana idatha eseleyo "kumnini wangaphambili". Endaweni yoko, kungcono kakhulu ukufunda incwadana kwaye ubone ukuba xa udala itafile kunokwenzeka ukongeza ON COMMIT DROP
- oko kukuthi, xa ukuthengiselana kugqityiwe, itafile iya kususwa ngokuzenzekelayo.
Ukungaphindaphindi
Ngenxa yokuba zezonxibelelwano oluthile kuphela, iitheyibhile zexeshana aziphindaphindwa. Kodwa oku kuphelisa imfuneko yokurekhodwa kabini kwedatha kwimfumba + i-WAL, ngoko ke FAKA/UHLAZIYWA/CIMA kuyo kukhawuleza kakhulu.
Kodwa ekubeni itheyibhile yethutyana iseyitafile "ephantse iqhelekile", ayinakwenziwa kwi-replica nokuba. Ubuncinci okwangoku, nangona isiqwenga esihambelanayo sele sijikeleza ixesha elide.
1.2. ITHEYIBHILE ENGONGAWANGA
Kodwa kufuneka wenze ntoni, umzekelo, ukuba unohlobo oluthile lwenkqubo enzima ye-ETL engenakuphunyezwa ngaphakathi kwentengiselwano enye, kodwa usenayo pgbouncer kwimo yentengiselwano? ..
Okanye ukuhamba kwedatha kukhulu kangangokuba Akukho bandwidth yaneleyo kuqhagamshelwano olunye ukusuka kwisiseko sedatha (funda, inkqubo enye nge-CPU)?..
Okanye eminye imisebenzi iyaqhubeka ngokungangqamanisi kwiindibano ezahlukeneyo?..
Inye kuphela inketho apha - yenza okwethutyana itafile engeyoyexeshana. Pun, ewe. Yiyo i:
- ndenze "ezam" iitafile ezinamagama angaqhelekanga kakhulu ukuze ungadibanisi nabani na
- Ukukhupha: zizalise ngedatha evela kumthombo wangaphandle
- Guqula: iguquliwe, izaliswe kwiindawo eziphambili zokudibanisa
- umthwalo: ugalele idatha esele ilungile kwiitafile ekujoliswe kuzo
- iitafile "zam" icinyiwe
Kwaye ngoku - impukane kwi-ointment. Inyaniso, konke kubhala kwiPostgreSQL kwenzeka kabini - COMMIT
'nutty kunye ROLLBACK
'iintengiselwano ezingenanto.
Kodwa asiyifuni le nto! Sinayo yonke inkqubo Mhlawumbi yayiphumelele ngokupheleleyo okanye ayizange iphumelele.. Ayinamsebenzi nokuba zingaphi iintengiselwano eziphakathi eziya kubakho - asinamdla "ekuqhubekeni kwenkqubo ukusuka phakathi," ngakumbi xa kungacaci ukuba ibiphi.
Ukwenza oku, abaphuhlisi bePostgreSQL, babuyele kwinguqulo 9.1, bazise into efana nale
Ngolu phawu, itafile yenziwe njengengabhalwanga. Idatha ebhalelwe iitheyibhile ezingabhalwanga ayidluli kushicilelo olubhalwe ngaphambili (jonga iSahluko 29), ibangela ukuba iitheyibhile ezinjalo zibe sebenza ngokukhawuleza kunesiqhelo. Noko ke, nazo ziyasilela; kwimeko yokusilela komncedisi okanye ukucima kukaxakeka, itafile engabhalwanga inqunyulwe ngokuzenzekelayo. Ukongeza, imixholo yetafile engabhalwanga ayiphindaphindwa kubancedisi bamakhoboka. Naziphi na izalathisi ezenziwe kwitafile engabhalwanga ngokuzenzekelayo ziye zivulwe.
Ngokufutshane, iya kukhawuleza kakhulu, kodwa ukuba iseva yedatha "iyawa", ayiyi kuba mnandi. Kodwa kwenzeka kangaphi oku, kwaye ngaba inkqubo yakho ye-ETL iyayazi indlela yokuyilungisa ngokuchanekileyo “ukusuka phakathi” emva “kokuvuselela” isiseko sedatha?..
Ukuba akunjalo, kwaye imeko engentla iyafana neyakho, sebenzisa UNLOGGED
, kodwa zange Sukwenza olu phawu kwiitafile zokwenyani, idatha oyithandayo kuwe.
1.3. NGENXA YEZIBONELELO { CIMA IMIQEQO | LAHLA}
Olu lwakhiwo lukuvumela ukuba uchaze ukuziphatha okuzenzekelayo xa utshintshiselwano lugqityiwe xa usenza itafile.
phezu ON COMMIT DROP
Sele ndibhale ngasentla, iyavelisa DROP TABLE
, kodwa kunye ON COMMIT DELETE ROWS
imeko inomdla ngakumbi - uveliswa apha TRUNCATE TABLE
.
Kuba yonke isiseko sokugcina inkcazo yemeta yetafile yethutyana ifana ncam naleyo yetafile eqhelekileyo, ngoko. Ukudala rhoqo kunye nokususwa kweetafile zesikhashana kukhokelela "ekudumbeni" okunzima kweetafile zenkqubo pg_class, pg_attribute, pg_attrdef, pg_depend,...
Ngoku khawufane ucinge ukuba unomsebenzi ngokuqhagamshelwa ngokuthe ngqo kwisiseko sedatha, evula intengiselwano entsha nganye yesibini, idala, igcwalise, iqhube kwaye isuse itafile yesikhashana ... Kuya kubakho ubuninzi benkunkuma eqokelelwe kwiitafile zenkqubo, kwaye oku kuya kubangela iziqhoboshi ezongezelelweyo kumsebenzi ngamnye.
Ngokubanzi, musa ukwenza oku! Kule meko kusebenza ngakumbi CREATE TEMPORARY TABLE x ... ON COMMIT DELETE ROWS
yikhuphe kumjikelo wentengiselwano - ngoko ekuqaleni kwentengiselwano entsha nganye iitafile sele sele iya kubakho (gcina umnxeba CREATE
), kodwa iya kuba ingenanto, enkosi ku TRUNCATE
(siphinde sagcina umnxeba wayo) xa sigqibezela intengiselwano yangaphambili.
1.4. THANDA...UKUHLANGANISA...
Ndikhankanyile ekuqaleni ukuba enye yeemeko zokusetyenziswa kwetafile zexeshana ziintlobo ezahlukeneyo zokungeniswa ngaphandle- kwaye umphuhlisi ngokudinwa ukopa-uncamathisela uluhlu lwemimandla yetafile ekujoliswe kuyo kwisibhengezo sexeshana lakhe...
Kodwa ubuvila yinjini yenkqubela phambili! Kunjalo ngoba yenza itafile entsha "ngokusekwe kwisampulu" inokuba lula ngakumbi:
CREATE TEMPORARY TABLE import_table(
LIKE target_table
);
Kuba emva koko unokwenza idatha eninzi kule theyibhile, ukukhangela kuyo akusayi kukhawuleza. Kodwa kukho isisombululo semveli kule nto - izalathisi! Kwaye, ewe, itafile yethutyana ingaba nazo izalathi.
Kuba, rhoqo, izalathisi ezifunekayo zingqinelana nezalathisi zetafile ekujoliswe kuyo, unokubhala ngokulula LIKE target_table INCLUDING INDEXES
.
Ukuba nawe uyafuna DEFAULT
-amaxabiso (umzekelo, ukugcwalisa amaxabiso aphambili), ungasebenzisa LIKE target_table INCLUDING DEFAULTS
. Okanye ngokulula- LIKE target_table INCLUDING ALL
- iikopi ezisisiseko, izalathisi, imiqobo,...
Kodwa apha kufuneka uqonde ukuba ukuba udale ngenisa itheyibhile ngokukhawuleza ngezalathisi, emva koko idatha iyakuthatha ixesha elide ukulayishwakunokuba uqale ugcwalise yonke into, kwaye emva koko usonge izalathisi - jonga indlela ekwenza ngayo oku njengomzekelo
Ngamafutshane,
2. Ibhalwa njani?
Mandithi - sebenzisa
-hamba endaweni ye "pack" INSERT
,
3. Indlela yokwenza?
Ke, masivumele isingeniso sethu sijonge ngolu hlobo:
- unetafile enedatha yomxhasi egcinwe kuvimba wakho weenkcukacha Iirekhodi ze-1M
- yonke imihla umxhasi ukuthumelela entsha ngokupheleleyo "umfanekiso"
- ngokusuka kumava uyazi ukuba amaxesha ngamaxesha akukho ngaphezulu kweerekhodi ze-10K ezitshintshiweyo
Umzekelo oqhelekileyo wemeko enjalo
3.1. I-algorithm yongqamaniso olupheleleyo
Ukwenza lula, masithi awufuni kwaukuhlengahlengisa idatha - yizisa nje itafile kwifomu oyifunayo, oko kukuthi:
- ukususa yonke into engasekhoyo
- hlaziya yonke into esele ikhona kwaye kufuneka ihlaziywe
- faka yonke into engekenzeki
Kutheni le nto kufuneka kwenziwe imisebenzi ngolu hlobo? Kuba le yindlela ubungakanani betafile buya kukhula kancinci (
CIMA KWI-dst
Hayi, ewe, ungadlula ngemisebenzi emibini nje:
- ukususa (
DELETE
) yonke into ngokubanzi - faka konke kumfanekiso omtsha
Kodwa kwangaxeshanye, enkosi kwi-MVCC, Ubungakanani betafile buya kunyuka ngokuphindwe kabini! Ukufumana +1M imifanekiso yeerekhodi kwitheyibhile ngenxa yohlaziyo lwe-10K kukungafuneki ...
TRUNCATE dst
Umphuhlisi onamava ngakumbi uyazi ukuba yonke ithebhulethi inokucocwa ngexabiso eliphantsi:
- icace (
TRUNCATE
) itafile yonke - faka konke kumfanekiso omtsha
Indlela iyasebenza,
Oko kukuthi:
- siyaqala intengiselwano yexesha elide
TRUNCATE
ibeka AccessExclusive-ukuthintela- senza ukufakwa ixesha elide, kwaye wonke umntu ngeli xesha andikwazi nokuba
SELECT
Kukho into engahambi kakuhle...
ALTER TABLE... TSHINTSHA KWAKHO... / RIP THEABLE...
Enye indlela kukugcwalisa yonke into kwitafile entsha eyahlukileyo, kwaye uyinike igama kwakhona endaweni yendala. Zimbini izinto ezincinci ezimbi:
- nangoku kwakhona AccessExclusive, nangona ixesha lincinci kakhulu
- zonke izicwangciso zemibuzo/izibalo zale theyibhile zisetwa ngokutsha,
kufuneka uqhube HLALUTYA - zonke izitshixo zasemzini zaphukile (FK) kwitafile
Kwakukho ipatch ye-WIP evela kuSimon Riggs ecebisa ukwenziwa ALTER
-umsebenzi wokutshintsha umzimba wetafile kwinqanaba lefayile, ngaphandle kokuchukumisa izibalo kunye ne-FK, kodwa ayizange iqokelele ikhoram.
Cima, HLAZIYA, FAKA
Ke, sihlala kukhetho olungathinteliyo lwemisebenzi emithathu. Phantse abathathu... Uyenza njani le nto ngempumelelo?
-- все делаем в рамках транзакции, чтобы никто не видел "промежуточных" состояний
BEGIN;
-- создаем временную таблицу с импортируемыми данными
CREATE TEMPORARY TABLE tmp(
LIKE dst INCLUDING INDEXES -- по образу и подобию, вместе с индексами
) ON COMMIT DROP; -- за рамками транзакции она нам не нужна
-- быстро-быстро вливаем новый образ через COPY
COPY tmp FROM STDIN;
-- ...
-- .
-- удаляем отсутствующие
DELETE FROM
dst D
USING
dst X
LEFT JOIN
tmp Y
USING(pk1, pk2) -- поля первичного ключа
WHERE
(D.pk1, D.pk2) = (X.pk1, X.pk2) AND
Y IS NOT DISTINCT FROM NULL; -- "антиджойн"
-- обновляем оставшиеся
UPDATE
dst D
SET
(f1, f2, f3) = (T.f1, T.f2, T.f3)
FROM
tmp T
WHERE
(D.pk1, D.pk2) = (T.pk1, T.pk2) AND
(D.f1, D.f2, D.f3) IS DISTINCT FROM (T.f1, T.f2, T.f3); -- незачем обновлять совпадающие
-- вставляем отсутствующие
INSERT INTO
dst
SELECT
T.*
FROM
tmp T
LEFT JOIN
dst D
USING(pk1, pk2)
WHERE
D IS NOT DISTINCT FROM NULL;
COMMIT;
3.2. Ngenisa emva kokulungiswa
Kwi-KLADR efanayo, zonke iirekhodi ezitshintshiweyo kufuneka ziqhutywe ngokugqithiswa kwe-post-processing - eziqhelekileyo, amagama angundoqo agxininiswe, kwaye ancitshiswe kwizakhiwo ezifunekayo. Kodwa uyazi njani - yintoni kanye etshintshileyongaphandle kokwenza nzima ikhowudi yongqamaniso, ngokufanelekileyo ngaphandle kokuyichukumisa kwaphela?
Ukuba kuphela inkqubo yakho inofikelelo lokubhala ngexesha longqamaniso, ngoko ungasebenzisa inqaku eliza kuthi liqokelele lonke utshintsho kuthi:
-- целевые таблицы
CREATE TABLE kladr(...);
CREATE TABLE kladr_house(...);
-- таблицы с историей изменений
CREATE TABLE kladr$log(
ro kladr, -- тут лежат целые образы записей старой/новой
rn kladr
);
CREATE TABLE kladr_house$log(
ro kladr_house,
rn kladr_house
);
-- общая функция логирования изменений
CREATE OR REPLACE FUNCTION diff$log() RETURNS trigger AS $$
DECLARE
dst varchar = TG_TABLE_NAME || '$log';
stmt text = '';
BEGIN
-- проверяем необходимость логгирования при обновлении записи
IF TG_OP = 'UPDATE' THEN
IF NEW IS NOT DISTINCT FROM OLD THEN
RETURN NEW;
END IF;
END IF;
-- создаем запись лога
stmt = 'INSERT INTO ' || dst::text || '(ro,rn)VALUES(';
CASE TG_OP
WHEN 'INSERT' THEN
EXECUTE stmt || 'NULL,$1)' USING NEW;
WHEN 'UPDATE' THEN
EXECUTE stmt || '$1,$2)' USING OLD, NEW;
WHEN 'DELETE' THEN
EXECUTE stmt || '$1,NULL)' USING OLD;
END CASE;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Ngoku sinokufaka izichukumiso phambi kokuba siqale ungqamaniso (okanye sikwazi ukusebenzisa ALTER TABLE ... ENABLE TRIGGER ...
):
CREATE TRIGGER log
AFTER INSERT OR UPDATE OR DELETE
ON kladr
FOR EACH ROW
EXECUTE PROCEDURE diff$log();
CREATE TRIGGER log
AFTER INSERT OR UPDATE OR DELETE
ON kladr_house
FOR EACH ROW
EXECUTE PROCEDURE diff$log();
Kwaye emva koko sikhupha ngokuzolileyo lonke utshintsho esiludingayo kwiitafile zelog kwaye siziqhube ngokusebenzisa abaphathi abongezelelweyo.
3.3. Ukuthathwa ngaphandle kweSeti eziManyanisiweyo
Ngentla sithathele ingqalelo iimeko xa izakhiwo zedatha zomthombo kunye nendawo ekuyiwa kuyo ziyafana. Kodwa kuthekani ukuba ukulayishwa kwenkqubo yangaphandle kunefomathi eyahlukileyo kwisakhiwo sokugcina kwisiseko sethu sedatha?
Makhe sithathe njengomzekelo ukugcinwa kwabathengi kunye neeakhawunti zabo, ukhetho lwakudala "luninzi-kuya-enye":
CREATE TABLE client(
client_id
serial
PRIMARY KEY
, inn
varchar
UNIQUE
, name
varchar
);
CREATE TABLE invoice(
invoice_id
serial
PRIMARY KEY
, client_id
integer
REFERENCES client(client_id)
, number
varchar
, dt
date
, sum
numeric(32,2)
);
Kodwa ukukhuphela okuvela kumthombo wangaphandle kuza kuthi ngendlela "yonke into enye":
CREATE TEMPORARY TABLE invoice_import(
client_inn
varchar
, client_name
varchar
, invoice_number
varchar
, invoice_dt
date
, invoice_sum
numeric(32,2)
);
Ngokucacileyo, idatha yomthengi inokuphinda iphindwe kule nguqulo, kwaye irekhodi eliphambili "yiakhawunti":
0123456789;Вася;A-01;2020-03-16;1000.00
9876543210;Петя;A-02;2020-03-16;666.00
0123456789;Вася;B-03;2020-03-16;9999.00
Kwimodeli, siza kufaka idatha yethu yovavanyo, kodwa khumbula - COPY
esebenza ngakumbi!
INSERT INTO invoice_import
VALUES
('0123456789', 'Вася', 'A-01', '2020-03-16', 1000.00)
, ('9876543210', 'Петя', 'A-02', '2020-03-16', 666.00)
, ('0123456789', 'Вася', 'B-03', '2020-03-16', 9999.00);
Okokuqala, makhe siqaqambise ezo “sikelelo” apho “iinyani” zethu zibhekisa khona. Kwimeko yethu, ii-invoyisi zibhekisa kubathengi:
CREATE TEMPORARY TABLE client_import AS
SELECT DISTINCT ON(client_inn)
-- можно просто SELECT DISTINCT, если данные заведомо непротиворечивы
client_inn inn
, client_name "name"
FROM
invoice_import;
Ukuze sinxulumane ngokuchanekileyo ii-akhawunti kunye nee-ID zabathengi, kufuneka siqale sifumane okanye sivelise ezi zichongi. Makhe songeze iindawo eziphantsi kwazo:
ALTER TABLE invoice_import ADD COLUMN client_id integer;
ALTER TABLE client_import ADD COLUMN client_id integer;
Masisebenzise indlela yongqamaniso yetafile echazwe ngasentla ngesilungiso esincinci - asiyi kuhlaziya okanye sicime nantoni na kwitafile ekujoliswe kuyo, kuba singenisa abathengi "i-append-kuphela":
-- проставляем в таблице импорта ID уже существующих записей
UPDATE
client_import T
SET
client_id = D.client_id
FROM
client D
WHERE
T.inn = D.inn; -- unique key
-- вставляем отсутствовавшие записи и проставляем их ID
WITH ins AS (
INSERT INTO client(
inn
, name
)
SELECT
inn
, name
FROM
client_import
WHERE
client_id IS NULL -- если ID не проставился
RETURNING *
)
UPDATE
client_import T
SET
client_id = D.client_id
FROM
ins D
WHERE
T.inn = D.inn; -- unique key
-- проставляем ID клиентов у записей счетов
UPDATE
invoice_import T
SET
client_id = D.client_id
FROM
client_import D
WHERE
T.client_inn = D.inn; -- прикладной ключ
Enyanisweni, yonke into ilungile invoice_import
Ngoku sinendawo yoqhagamshelwano ezaliswe kuyo client_id
, esiza kufaka ngayo i-invoyisi.
umthombo: www.habr.com