Ngokucubungula okuyinkimbinkimbi kwamasethi amakhulu wedatha (okuhlukile
Umsebenzi ojwayelekile walolu hlobo ngokuvamile uzwakala kanje: "Lapho ke
Kodwa lapho ivolumu yale "nto" iqala ukulinganisa ngamakhulu ama-megabytes, futhi isevisi kufanele iqhubeke nokusebenza ne-database 24x7, kuvela imiphumela eminingi engemihle ezolimaza impilo yakho.
Ukubhekana nazo ku-PostgreSQL (hhayi kuyo kuphela), ungasebenzisa ukulungiselelwa okuthile okuzokuvumela ukuthi ucubungule yonke into ngokushesha nangokusetshenziswa okuncane kwezinsiza.
1. Kuzothunyelwa kuphi?
Okokuqala, ake sinqume ukuthi singayilayisha kuphi idatha esifuna “ukuyicubungula.”
1.1. Amathebula esikhashana (TEMPORARY TABLE)
Empeleni, kumathebula esikhashana e-PostgreSQL ayafana nanoma yimaphi amanye. Ngakho-ke, izinkolelo-ze ziyathanda “Yonke into ekhona igcinwa enkumbulweni kuphela, futhi ingaphela”. Kodwa futhi kukhona umehluko omkhulu obalulekile.
Eyakho "indawo yegama" ekuxhumekeni ngakunye kusizindalwazi
Uma ukuxhumana okubili zama ukuxhuma ngesikhathi esisodwa CREATE TABLE x
, khona-ke othile uzothola nakanjani iphutha elingeyona into ehlukile izinto zesizindalwazi.
Kodwa uma bobabili bezama ukwenza CREATE TEMPORARY TABLE x
, khona-ke bobabili bazokwenza ngokujwayelekile, futhi wonke umuntu uzothola ikhophi yakho amatafula. Futhi ngeke kube khona okufanayo phakathi kwabo.
"Ukuzilimaza" lapho unqamula
Uma uxhumano luvaliwe, wonke amatafula esikhashana asuswa ngokuzenzakalelayo, ngakho-ke ngesandla DROP TABLE x
akukho lutho ngaphandle...
Uma usebenza pgbouncer kumodi yokwenziwayo, khona-ke i-database iyaqhubeka nokukholelwa ukuthi lokhu kuxhumana kusasebenza, futhi kulo leli thebula lesikhashana lisekhona.
Ngakho-ke, ukuzama ukuyidala futhi, ukusuka kokunye ukuxhumana kuya ku-pgbouncer, kuzoholela ephutheni. Kodwa lokhu kungagwenywa ngokusebenzisa CREATE TEMPORARY TABLE IF NOT EXISTS x
.
Yiqiniso, kungcono ukungakwenzi lokhu noma kunjalo, ngoba khona-ke ungakwazi "ngokuzumayo" ukuthola lapho idatha esele "kumnikazi wangaphambili". Kunalokho, kungcono kakhulu ukufunda imanuwali futhi ubone ukuthi uma udala itafula kungenzeka ukwengeza ON COMMIT DROP
- okungukuthi, lapho ukuthengiselana kuqeda, itafula lizosuswa ngokuzenzakalelayo.
Ukungaphindaphindi
Ngenxa yokuthi aboxhumano oluthile kuphela, amathebula esikhashana awaphindwaphindwa. Kodwa lokhu kuqeda isidingo sokuqoshwa kabili kwedatha kunqwaba + WAL, ngakho-ke FAKA/BUYEKEZA/SUSA kuyo kushesha kakhulu.
Kodwa njengoba ithebula lesikhashana liseyithebula "elicishe lijwayelekile", alikwazi ukudalwa nakulomfanekiso. Okungenani okwamanje, nakuba isiqephu esihambisanayo sekuyisikhathi eside sijikeleza.
1.2. ITAFULA ENGAKUNGULIWE
Kodwa yini okufanele uyenze, isibonelo, uma unohlobo oluthile lwenqubo ye-ETL enzima engakwazi ukuqaliswa phakathi kokuthengiselana okukodwa, kodwa usenayo pgbouncer kumodi yokwenziwayo? ..
Noma ukugeleza kwedatha kukhulu kangangokuthi Awukho umkhawulokudonsa owanele ekuxhumekeni okukodwa kusuka kusizindalwazi (funda, inqubo eyodwa nge-CPU ngayinye)?..
Noma eminye imisebenzi iyaqhubeka ngokulinganayo ekuxhumaneni okuhlukene?..
Kunenketho eyodwa kuphela lapha - dala okwesikhashana ithebula elingelona elesikhashana. Pun, yebo. Leyo:
- ngidale amathebula "ami" anamagama angahleliwe kakhulu ukuze ngingahlangani nanoma ubani
- extract: uwagcwalise ngedatha evela emthonjeni wangaphandle
- Guqula: iguquliwe, igcwaliswe ngezinkambu eziyinhloko zokuxhumanisa
- Layisha: uthele idatha elungile kumathebula okuqondiwe
- amatafula "ami" asusiwe
Futhi manje - impukane emafutheni. Empeleni, konke kubhala ku-PostgreSQL kwenzeka kabili - COMMIT
'nathi futhi ROLLBACK
'ukuthenga okungenalutho.
Kodwa asikudingi lokhu! Sinayo yonke inqubo Kuphakathi kokuthi iphumelele ngokuphelele noma ayiphumelelanga.. Akunandaba ukuthi zingaki izinkokhelo eziphakathi okuzoba khona - asinantshisekelo "ekuqhubekiseni inqubo kusukela phakathi," ikakhulukazi uma kungacaci ukuthi ibikuphi.
Ukwenza lokhu, abathuthukisi be-PostgreSQL, emuva kunguqulo 9.1, bethule into enjengokuthi
Ngalesi sibonakaliso, ithebula lidalwe njengengabhalisiwe. Idatha ebhalelwe amathebula angabhalisiwe ayidluli kulogi yokubhala kusengaphambili (bona iSahluko 29), okwenza amathebula anjalo ukuthi sebenza ngokushesha kakhulu kunokujwayelekile. Nokho, awagonyiwe ekuhlulekeni; uma iseva yehluleka noma ukuvalwa kwesimo esiphuthumayo, ithebula elingabhalisiwe kuncishiswe ngokuzenzakalelayo. Ukwengeza, okuqukethwe kwethebula elingabhalisiwe akuphindaphindwa kumaseva ezigqila. Noma yiziphi izinkomba ezidalwe kuthebula elingabhalisiwe ziyasuswa ngokuzenzakalelayo.
Kafushane nje, izoshesha kakhulu, kodwa uma iseva yedatha "ewa", ngeke kube mnandi. Kodwa lokhu kwenzeka kangaki, futhi ingabe inqubo yakho ye-ETL iyakwazi ukulungisa lokhu ngendlela efanele “kusuka phakathi” ngemva “kokuvuselela” isizindalwazi?..
Uma kungenjalo, futhi icala elingenhla lifana nelakho, sebenzisa UNLOGGED
kodwa ungalokothi unganikezi lesi sici kumathebula wangempela, idatha oyithandayo kuwe.
1.3. OKUZINIKELA { SUSA IMIGWA | YEHLA}
Lokhu kwakhiwa kukuvumela ukuthi ucacise ukuziphatha okuzenzakalelayo lapho umsebenzi usuqediwe uma udala ithebula.
Mayelana ON COMMIT DROP
Sengike ngabhala ngenhla, iyakhiqiza DROP TABLE
, kodwa nge ON COMMIT DELETE ROWS
isimo sithakazelisa kakhulu - sikhiqizwa lapha TRUNCATE TABLE
.
Njengoba yonke ingqalasizinda yokugcina incazelo ye-meta yetafula lesikhashana ifana ncamashi naleyo yetafula elivamile, khona-ke Ukudala njalo nokususwa kwamatafula esikhashana kuholela "ekuvuvukeni" okunzima kwamatafula esistimu pg_class, pg_attribute, pg_attrdef, pg_depend,…
Manje cabanga ukuthi unomsebenzi ekuxhumekeni okuqondile ku-database, evula ukuthengiselana okusha njalo ngomzuzwana, idala, igcwalise, icubungule futhi isuse itafula lesikhashana ... Kuzoba nodoti oweqile oqoqwe kumatafula wesistimu, futhi lokhu kuzodala amabhuleki engeziwe ekusebenzeni ngakunye.
Ngokuvamile, ungakwenzi lokhu! Kulokhu kuphumelela kakhulu CREATE TEMPORARY TABLE x ... ON COMMIT DELETE ROWS
ikhiphe emjikelezweni wokwenziwayo - lapho-ke ekuqaleni komsebenzi omusha ngamunye amathebula asevele akhona izoba khona (gcina ucingo CREATE
), kepha izobe ingenalutho, sibonga u TRUNCATE
(siphinde salondoloza ucingo lwayo) uma siqedela okwenziwa ngaphambilini.
1.4. THANDA...KUHLANGANISA...
Ngishilo ekuqaleni ukuthi enye yezindlela ezijwayelekile zokusebenzisa amatafula esikhashana yizinhlobo ezahlukahlukene zokungenisa - futhi unjiniyela ngokukhathala ukopisha-unamathisele uhlu lwezinkambu zetafula eliqondiwe esimemezelweni sakhe sesikhashana...
Kodwa ubuvila buyinjini yenqubekelaphambili! Yingakho ke dala ithebula elisha "ngokusekelwe kusampula" kungaba lula kakhulu:
CREATE TEMPORARY TABLE import_table(
LIKE target_table
);
Njengoba ungakwazi ukukhiqiza idatha eningi kuleli thebula, ukusesha ngeke kusheshe. Kodwa kukhona ikhambi lendabuko kulokhu - izinkomba! Futhi, yebo, ithebula lesikhashana lingaba nezinkomba.
Njengoba, ngokuvamile, izinkomba ezidingekayo zihambisana nezinkomba zetafula eliqondiwe, ungavele ubhale LIKE target_table INCLUDING INDEXES
.
Uma nawe udinga DEFAULT
-values (ngokwesibonelo, ukugcwalisa amanani abalulekile), ungasebenzisa LIKE target_table INCLUDING DEFAULTS
. Noma kalula - LIKE target_table INCLUDING ALL
- amakhophi okuzenzakalelayo, izinkomba, imikhawulo,...
Kodwa lapha udinga ukuqonda ukuthi uma udale ngenisa ithebula ngokushesha ngezinkomba, khona-ke idatha izothatha isikhathi eside ukulayishwakunokuba uma ugcwalisa yonke into kuqala, bese ugoqa izinkomba - bheka ukuthi ukwenza kanjani lokhu njengesibonelo
Ngamafuphi,
2. Ibhalwa kanjani?
Ake ngithi nje - sebenzisa
-geleza esikhundleni sokuthi "iphakethe" INSERT
,
3. Icutshungulwa kanjani?
Ngakho-ke, ake sivumele isingeniso sethu sibukeke kanje:
- unetafula elinedatha yeklayenti egcinwe kusizindalwazi sakho 1M amarekhodi
- nsuku zonke iklayenti likuthumelela entsha "isithombe" esigcwele
- ngokuhlangenwe nakho uyakwazi lokho ngezikhathi ezithile akukho amarekhodi angaphezu kuka-10K ashintshiwe
Isibonelo esiyingqayizivele sesimo esinjalo
3.1. I-algorithm yokuvumelanisa egcwele
Ukwenza kube lula, ake sithi awudingi nokuhlela kabusha idatha - vele ulethe ithebula efomini oyifunayo, okungukuthi:
- susa konke okungasekho
- buyekeza yonke into ebivele ikhona futhi idinga ukubuyekezwa
- faka konke okungakenzeki
Kungani imisebenzi kufanele yenziwe ngale ndlela? Ngoba le yindlela usayizi wetafula ozokhula ngayo kancane (
SUSA KU-dst
Cha, vele ungaphumelela ngemisebenzi emibili nje:
- susa (
DELETE
) konke ngokujwayelekile - faka konke kusuka esithombeni esisha
Kodwa ngesikhathi esifanayo, sibonga i-MVCC, Ubukhulu betafula buzokhula ngokuphindwe kabili! Ukuthola +1M izithombe zamarekhodi kuthebula ngenxa yesibuyekezo esingu-10K akudingekile kakhulu...
TRUNCATE dst
Unjiniyela onolwazi olunzulu uyazi ukuthi yonke ithebhulethi ingahlanzwa ngentengo ephansi:
- kucace (
TRUNCATE
) itafula lonke - faka konke kusuka esithombeni esisha
Indlela iyasebenza,
Okuchaza:
- siyaqala okwenziwa isikhathi eside
TRUNCATE
kubeka Finyelela Okukhethekile-ukuvimba- senza ukufakwa isikhathi eside, futhi wonke umuntu ngalesi sikhathi angikwazi ngisho
SELECT
Kukhona okungahambi kahle...
ALTER TABLE... GAMA KABUSHA... / DRIP TABLE...
Okunye ukugcwalisa yonke into etafuleni elisha elihlukile, bese uvele uliqambe kabusha esikhundleni saleli elidala. Izinto ezimbalwa ezimbi ezimbi:
- namanje futhi Finyelela Okukhethekile, nakuba isikhathi esincane kakhulu
- zonke izinhlelo zemibuzo/izibalo zaleli thebula zisethwa kabusha,
kudingeka ugijime HLAZIYA - zonke izihluthulelo zakwamanye amazwe ziphukile (FK) etafuleni
Kube nesiqephu se-WIP esivela ku-Simon Riggs esiphakamise ukwenziwa ALTER
-umsebenzi wokushintsha indikimba yetafula ezingeni lefayela, ngaphandle kokuthinta izibalo kanye ne-FK, kodwa ayizange iqoqe ikhoramu.
SUSA, BUYEKEZA, FAKA
Ngakho-ke, sixazulula inketho yokungavimbeli yemisebenzi emithathu. Cishe ezintathu... Ungakwenza kanjani lokhu ngempumelelo kakhulu?
-- все делаем в рамках транзакции, чтобы никто не видел "промежуточных" состояний
BEGIN;
-- создаем временную таблицу с импортируемыми данными
CREATE TEMPORARY TABLE tmp(
LIKE dst INCLUDING INDEXES -- по образу и подобию, вместе с индексами
) ON COMMIT DROP; -- за рамками транзакции она нам не нужна
-- быстро-быстро вливаем новый образ через COPY
COPY tmp FROM STDIN;
-- ...
-- .
-- удаляем отсутствующие
DELETE FROM
dst D
USING
dst X
LEFT JOIN
tmp Y
USING(pk1, pk2) -- поля первичного ключа
WHERE
(D.pk1, D.pk2) = (X.pk1, X.pk2) AND
Y IS NOT DISTINCT FROM NULL; -- "антиджойн"
-- обновляем оставшиеся
UPDATE
dst D
SET
(f1, f2, f3) = (T.f1, T.f2, T.f3)
FROM
tmp T
WHERE
(D.pk1, D.pk2) = (T.pk1, T.pk2) AND
(D.f1, D.f2, D.f3) IS DISTINCT FROM (T.f1, T.f2, T.f3); -- незачем обновлять совпадающие
-- вставляем отсутствующие
INSERT INTO
dst
SELECT
T.*
FROM
tmp T
LEFT JOIN
dst D
USING(pk1, pk2)
WHERE
D IS NOT DISTINCT FROM NULL;
COMMIT;
3.2. Ngenisa ngemuva kokucubungula
Ku-KLADR efanayo, wonke amarekhodi ashintshiwe kufanele aphinde aqhutshwe ngokucutshungulwa kwangemuva - ajwayeleke, amagama angukhiye agqanyiswe, futhi ehliselwe ezakhiweni ezidingekayo. Kodwa wazi kanjani - yini ngempela eshintshilengaphandle kokufaka inkimbinkimbi ikhodi yokuvumelanisa, ngaphandle kokuyithinta nhlobo?
Uma kuphela inqubo yakho inokufinyelela kokubhala ngesikhathi sokuvumelanisa, ungasebenzisa isicuphi esizosiqoqela zonke izinguquko:
-- целевые таблицы
CREATE TABLE kladr(...);
CREATE TABLE kladr_house(...);
-- таблицы с историей изменений
CREATE TABLE kladr$log(
ro kladr, -- тут лежат целые образы записей старой/новой
rn kladr
);
CREATE TABLE kladr_house$log(
ro kladr_house,
rn kladr_house
);
-- общая функция логирования изменений
CREATE OR REPLACE FUNCTION diff$log() RETURNS trigger AS $$
DECLARE
dst varchar = TG_TABLE_NAME || '$log';
stmt text = '';
BEGIN
-- проверяем необходимость логгирования при обновлении записи
IF TG_OP = 'UPDATE' THEN
IF NEW IS NOT DISTINCT FROM OLD THEN
RETURN NEW;
END IF;
END IF;
-- создаем запись лога
stmt = 'INSERT INTO ' || dst::text || '(ro,rn)VALUES(';
CASE TG_OP
WHEN 'INSERT' THEN
EXECUTE stmt || 'NULL,$1)' USING NEW;
WHEN 'UPDATE' THEN
EXECUTE stmt || '$1,$2)' USING OLD, NEW;
WHEN 'DELETE' THEN
EXECUTE stmt || '$1,NULL)' USING OLD;
END CASE;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Manje singasebenzisa izibangeli ngaphambi kokuqala ukuvumelanisa (noma sikwazi ukusebenzisa ALTER TABLE ... ENABLE TRIGGER ...
):
CREATE TRIGGER log
AFTER INSERT OR UPDATE OR DELETE
ON kladr
FOR EACH ROW
EXECUTE PROCEDURE diff$log();
CREATE TRIGGER log
AFTER INSERT OR UPDATE OR DELETE
ON kladr_house
FOR EACH ROW
EXECUTE PROCEDURE diff$log();
Bese sikhipha ngokuthula zonke izinguquko esizidingayo kumatafula elogi futhi sizisebenzise ngokusebenzisa izibambi ezengeziwe.
3.3. Ingenisa Amasethi Axhunyiwe
Ngenhla sicabangele izimo lapho izakhiwo zedatha yomthombo nendawo okuyiwa kuyo zifana. Kodwa kuthiwani uma ukulayishwa okuvela ohlelweni lwangaphandle kunefomethi ehlukile kunesakhiwo sesitoreji kusizindalwazi sethu?
Ake sithathe njengesibonelo ukugcinwa kwamaklayenti nama-akhawunti awo, inketho yakudala "yabaningi kuya kokukodwa":
CREATE TABLE client(
client_id
serial
PRIMARY KEY
, inn
varchar
UNIQUE
, name
varchar
);
CREATE TABLE invoice(
invoice_id
serial
PRIMARY KEY
, client_id
integer
REFERENCES client(client_id)
, number
varchar
, dt
date
, sum
numeric(32,2)
);
Kodwa ukulanda okuvela emthonjeni wangaphandle kuza kithi ngendlela yokuthi "konke kokukodwa":
CREATE TEMPORARY TABLE invoice_import(
client_inn
varchar
, client_name
varchar
, invoice_number
varchar
, invoice_dt
date
, invoice_sum
numeric(32,2)
);
Ngokusobala, idatha yekhasimende ingaphinda iphindwe kule nguqulo, futhi irekhodi eliyinhloko "i-akhawunti":
0123456789;Вася;A-01;2020-03-16;1000.00
9876543210;Петя;A-02;2020-03-16;666.00
0123456789;Вася;B-03;2020-03-16;9999.00
Kumodeli, sizomane sifake idatha yethu yokuhlola, kodwa khumbula - COPY
esebenza kahle kakhulu!
INSERT INTO invoice_import
VALUES
('0123456789', 'Вася', 'A-01', '2020-03-16', 1000.00)
, ('9876543210', 'Петя', 'A-02', '2020-03-16', 666.00)
, ('0123456789', 'Вася', 'B-03', '2020-03-16', 9999.00);
Okokuqala, ake sigqamise lokho “kusikeka” lapho “amaqiniso” ethu abhekisela khona. Esimweni sethu, ama-invoyisi abhekisa kumakhasimende:
CREATE TEMPORARY TABLE client_import AS
SELECT DISTINCT ON(client_inn)
-- можно просто SELECT DISTINCT, если данные заведомо непротиворечивы
client_inn inn
, client_name "name"
FROM
invoice_import;
Ukuze sihlobanise kahle ama-akhawunti nama-ID ekhasimende, sidinga kuqala ukuthola noma ukukhiqiza lezi zihlonzi. Masingeze izinkambu ngaphansi kwazo:
ALTER TABLE invoice_import ADD COLUMN client_id integer;
ALTER TABLE client_import ADD COLUMN client_id integer;
Masisebenzise indlela yokuvumelanisa yethebula echazwe ngenhla nesichibiyelo esincane - ngeke sibuyekeze noma sisuse lutho kuthebula eliqondiwe, ngoba singenisa amakhasimende "i-append-kuphela":
-- проставляем в таблице импорта ID уже существующих записей
UPDATE
client_import T
SET
client_id = D.client_id
FROM
client D
WHERE
T.inn = D.inn; -- unique key
-- вставляем отсутствовавшие записи и проставляем их ID
WITH ins AS (
INSERT INTO client(
inn
, name
)
SELECT
inn
, name
FROM
client_import
WHERE
client_id IS NULL -- если ID не проставился
RETURNING *
)
UPDATE
client_import T
SET
client_id = D.client_id
FROM
ins D
WHERE
T.inn = D.inn; -- unique key
-- проставляем ID клиентов у записей счетов
UPDATE
invoice_import T
SET
client_id = D.client_id
FROM
client_import D
WHERE
T.client_inn = D.inn; -- прикладной ключ
Eqinisweni, konke kungaphakathi invoice_import
Manje sesigcwalise inkambu yokuxhumana client_id
, esizofaka ngayo i-invoyisi.
Source: www.habr.com