DBA: hazie mmekọrịta nke ọma na mbubata

Maka nhazi mgbagwoju anya nke nnukwu data data (iche Usoro ETL: mbubata, ntụgharị na mmekọrịta ya na isi mmalite) na-enwekarị mkpa nwa oge “cheta” wee hazie ngwa ngwa ngwa ngwa ihe olu.

Ụdị ọrụ dị otú a na-adakarị ihe dị ka nke a: "Ugbu a Ewepụrụ ngalaba ndekọ ego n'ụlọ akụ ndị ahịa ịkwụ ụgwọ ikpeazụ natara, ịkwesịrị ibugo ha ngwa ngwa na webụsaịtị wee jikọta ha na akaụntụ gị.

Ma mgbe olu nke "ihe" a malitere ịtụ n'ọtụtụ narị megabyte, na ọrụ ahụ ga-anọgide na-arụ ọrụ na nchekwa data 24x7, ọtụtụ mmetụta na-ebilite ga-emebi ndụ gị.
DBA: hazie mmekọrịta nke ọma na mbubata
Iji mesoo ha na PostgreSQL (ọ bụghị naanị na ya), ị nwere ike iji ụfọdụ njikarịcha nke ga-enye gị ohere ịhazi ihe niile ngwa ngwa na obere ihe oriri.

1. Ebee ka ụgbọ mmiri?

Nke mbụ, ka anyị kpebie ebe anyị nwere ike bulite data nke anyị chọrọ 'ịhazi'.

1.1. Tebụl nwa oge (OGE OGE)

Na ụkpụrụ, maka PostgreSQL tebụl nwa oge bụ otu ihe ọ bụla ọzọ. Ya mere, nkwenkwe ụgha dị ka "A na-echekwa ihe niile dị na ebe nchekwa naanị, ọ pụkwara ịkwụsị". Mana enwerekwa ọdịiche dị ịrịba ama.

“Oghere aha” nke gị maka njikọ ọ bụla na nchekwa data

Ọ bụrụ na njikọ abụọ gbalịa jikọọ n'otu oge CREATE TABLE x, mgbe ahụ, mmadụ ga-enwetarịrị njehie na-abụghị nke pụrụ iche ihe nchekwa data.

Ma ọ bụrụ na ha abụọ na-agbalị igbu CREATE TEMPORARY TABLE x, mgbe ahụ ha abụọ ga-eme ya nke ọma, onye ọ bụla ga-enwetakwa oyiri gị tebụl. Ọ dịghịkwa ihe jikọrọ ha na ya.

"Ebibi onwe ya" mgbe ị kwụsịrị

Mgbe emechiri njikọ ahụ, tebụl nwa oge niile na-ehichapụ na-akpaghị aka, yabụ na aka DROP TABLE x ọnweghị uru ma ọbụghị...

Ọ bụrụ na ị na-arụ ọrụ pgbouncer na ọnọdụ azụmahịa, mgbe ahụ, nchekwa data na-aga n'ihu na-ekwenye na njikọ a ka na-arụ ọrụ, na n'ime ya ka okpokoro a na-adịru nwa oge ka dị.

Ya mere, ịgbalị ịmepụta ya ọzọ, site na njikọ dị iche na pgbouncer, ga-ebute njehie. Mana enwere ike ime nke a site na iji CREATE TEMPORARY TABLE IF NOT EXISTS x.

N'ezie, ọ ka mma ịghara ime nke a, n'ihi na mgbe ahụ ị nwere ike "na mberede" chọta ebe ahụ data fọdụrụ n'aka "onye nwe mbụ". Kama nke ahụ, ọ ka mma ịgụ akwụkwọ ntuziaka ma hụ na mgbe ị na-emepụta tebụl ọ ga-ekwe omume ịgbakwunye ON COMMIT DROP - ya bụ, mgbe azụmahịa ahụ gwụchara, a ga-ehichapụ tebụl na-akpaghị aka.

Enweghị mmụgharị

N'ihi na ha bụ naanị otu njikọ, a naghị emegharị tebụl nwa oge. Ma nke a na-ewepụ mkpa ịdekọ data ugboro abụọ n'obo + WAL, yabụ FAnye/Mmelite/Hichapụ n'ime ya na-adị ngwa ngwa.

Mana ebe ọ bụ na tebụl nwa oge ka bụ tebụl “ihe fọrọ nke nta ka ọ bụrụ nkịtị”, enweghị ike ịmepụta ya na oyiri. Dịkarịa ala ugbu a, ọ bụ ezie na patch kwekọrọ na-ekesa ogologo oge.

1.2. Tebụl anabataghị

Ma kedu ihe ị ga-eme, dịka ọmụmaatụ, ọ bụrụ na ị nwere ụdị usoro ETL dị egwu nke enweghị ike imejuputa n'ime otu azụmahịa, mana ị ka nwere. pgbouncer na ọnọdụ azụmahịa? ..

Ma ọ bụ data eruba dị ukwuu nke na Enweghị bandwidth zuru oke n'otu njikọ site na nchekwa data (gụọ, otu usoro kwa CPU)? ..

Ma ọ bụ ụfọdụ arụmọrụ na-aga asynchronously na njikọ dị iche iche? ..

Enwere naanị otu nhọrọ ebe a - mepụta tebụl na-abụghị nwa oge. Pun, ee. Ya bụ:

  • mepụtara tebụl “nke m” nwere aha na-enweghị usoro ka ọ ghara ịkọrọ onye ọ bụla
  • wepụ: juputara ha na data sitere na isi mmalite
  • Gbanwee: tụgharịrị, jupụta na mpaghara njikọ igodo
  • ibu: wụsara data njikere n'ime tebụl ebumnuche
  • ehichapụrụ tebụl “m”.

Ma ugbu a - ijiji na ude. N'ezie, niile na-ede na PostgreSQL mere ugboro abụọ - mbụ na WAL, wee banye na tebụl / index ozu. Emere ihe a niile iji kwado ACID yana mezie visibiliti data n'etiti COMMIT'nutty na ROLLBACKazụmahịa efu.

Ma anyị achọghị nke a! Anyị nwere usoro niile Ma ọ bụ ihe ịga nke ọma kpamkpam ma ọ bụ na ọ bụghị.. Ọ baghị uru ole azụmahịa dị n'etiti ga-adị - anyị enweghị mmasị na "ịga n'ihu usoro site n'etiti," karịsịa mgbe ọ na-edoghị anya ebe ọ dị.

Iji mee nke a, ndị mmepe PostgreSQL, laa azụ na ụdị 9.1, webatara ihe dị ka Tebụl enweghị ndekọ:

Site na ihe ngosi a, a na-emepụta tebụl dị ka enweghị ndekọ. Data edere na tebụl ndị na-edeghị akwụkwọ anaghị agafe na ndekọ ederede (lee Isi nke 29), na-eme ka tebụl dị otú ahụ na-arụ ọrụ ngwa ngwa karịa ka ọ dị na mbụ. Otú ọ dị, ha adịghị adabere na ọdịda; n'ihe gbasara ọdịda nkesa ma ọ bụ nkwụsị mberede, tebụl enweghị ndekọ na-akpaghị aka gbubiri. Na mgbakwunye, ọdịnaya dị na tebụl enweghị ndekọ emegharighi ya ka ohu sava. Ndekọ ndeksi ọ bụla emepụtara na tebụl etinyeghị aka na-apụta na-akpaghị aka.

Na nkenke ọ ga-adị ngwa ngwa, ma ọ bụrụ na ihe nkesa nchekwa data "dara", ọ ga-adịghị mma. Mana ugboro ole ka nke a na-eme, na usoro ETL gị maara ka esi edozi nke a n'ụzọ ziri ezi "site n'etiti" mgbe "ịmeghachite" nchekwa data?...

Ọ bụrụ na ọ bụghị, na ikpe dị n'elu yiri nke gị, jiri UNLOGGEDma ọ dịghị mgbe emela njirimara a na tebụl n'ezie, data sitere na nke dị gị mma.

1.3. NA COMMIT { Hichapụ ahịrị | DROP}

Nrụpụta a na-enye gị ohere ịkọwapụta omume akpaka mgbe emechara azụmahịa mgbe ị na-eke tebụl.

on ON COMMIT DROP M dere n'elu, ọ na-ebute DROP TABLE, ma na ON COMMIT DELETE ROWS ọnọdụ ahụ na-adọrọ mmasị karị - a na-emepụta ya ebe a TRUNCATE TABLE.

Ebe ọ bụ na akụrụngwa niile maka ịchekwa meta-nkọwa nke tebụl nwa oge bụ otu ihe ahụ nke tebụl oge niile, yabụ. Ịmepụta mgbe niile na ihichapụ tebụl nwa oge na-eduga na "ọzịza" siri ike nke tebụl usoro pg_class, pg_attribute, pg_attrdef, pg_depend,…

Ugbu a were ya na ị nwere onye ọrụ na njikọ kpọmkwem na nchekwa data, nke na-emepe azụmahịa ọhụrụ ọ bụla nke abụọ, na-emepụta, jupụta, na-edozi ma na-ehichapụ tebụl nwa oge ... A ga-enwe oke ihe mkpofu na-akwakọba na tebụl usoro, na nke a ga-ebute brek ọzọ maka ọrụ ọ bụla.

N'ozuzu, emela nke a! N'okwu a, ọ dị irè karị CREATE TEMPORARY TABLE x ... ON COMMIT DELETE ROWS wepụ ya na usoro azụmahịa - mgbe ahụ site na mmalite nke azụmahịa ọhụrụ ọ bụla, tebụl adịlarị ga-adị (chekwaa oku CREATE), mana ga-abụ ihe efu, daalụ TRUNCATE (anyị chekwara oku ya) mgbe emechara azụmahịa gara aga.

1.4. Dị ka...gụnyere...

M kwuru na mmalite na otu n'ime ihe ndị a na-ejikarị eme ihe maka tebụl nwa oge bụ ụdị dị iche iche nke mbubata - na onye mmepụta ike gwụrụ na-edepụta ndepụta nke ubi nke tebụl e lekwasịrị anya n'ime nkwupụta nke nwa oge ya ...

Ma umengwụ bụ engine nke ọganihu! Ya kpatara mepụta tebụl ọhụrụ "dabere na nlele" ọ nwere ike ịdị mfe karị:

CREATE TEMPORARY TABLE import_table(
  LIKE target_table
);

Ebe ị nwere ike iwepụta ọtụtụ data n'ime tebụl a, ịchọ ya agaghị adị ngwa ngwa. Ma enwere ngwọta ọdịnala na nke a - indexes! Na, ee, okpokoro nwa oge nwekwara ike ịnwe ndeksi.

Ebe ọ bụ na, mgbe mgbe, ndị chọrọ indexes dakọtara na indexes nke lekwasịrị tebụl, ị nwere ike nanị dee LIKE target_table INCLUDING INDEXES.

Ọ bụrụ na ị chọrọ DEFAULT-ụkpụrụ (dịka ọmụmaatụ, iji mejupụta ụkpụrụ isi isi), ị nwere ike iji LIKE target_table INCLUDING DEFAULTS. Ma ọ bụ naanị - LIKE target_table INCLUDING ALL - detuo ndabara, indexes, mgbochi,...

Ma ebe a ị kwesịrị ịghọta na ọ bụrụ na ị kere mbubata tebụl ozugbo na indexes, mgbe ahụ data ga-ewe ogologo oge ibukarịa ma ọ bụrụ na ị na-ebu ụzọ mejupụta ihe niile, na naanị mgbe ahụ tụgharịa index - lee otú o si eme nke a dị ka ihe atụ pg_dump.

Na mkpokọta RTFM!

2. Kedu ka esi ede?

Ka m kwuo - jiri ya COPY- erugharị kama “mkpọ” INSERT, osooso mgbe ụfọdụ. Ị nwere ike ọbụna site na faịlụ emepụtara mbụ.

3. Olee otú hazie?

Yabụ, ka anyị mee ka intro anyị dị ka nke a:

  • ị nwere tebụl nwere data ndị ahịa echekwara na nchekwa data gị Ndekọ 1M
  • kwa ụbọchị onye ahịa na-ezitere gị nke ọhụrụ zuru "oyiyi"
  • site n'ahụmahụ ị maara na site n'oge ruo n'oge ọ dịghị ihe karịrị 10K ndekọ agbanwere

Ihe atụ kpochapụwo nke ọnọdụ dị otú ahụ bụ KLADR isi - e nwere ọtụtụ adreesị na mkpokọta, ma na kwa izu bulite e nwere nnọọ ole na ole mgbanwe (renaming nke obodo, na-ejikọta n'okporo ámá, ọdịdị nke ụlọ ọhụrụ) ọbụna na mba ọnụ ọgụgụ.

3.1. Algọridim mmekọrịta zuru oke

Maka mfe, ka anyị kwuo na ọ dịghị mkpa ka ị gbanwee data ahụ - naanị weta tebụl n'ụdị achọrọ, ya bụ:

  • wepu ihe niile na-adịkwaghị adị
  • megharia ihe niile dị adị na mkpa ka emelite
  • fanye ihe niile emebeghi

Kedu ihe kpatara eji arụ ọrụ ndị a n'usoro a? N'ihi na nke a bụ otú tebụl size ga-eto obere (obere).cheta MVCC!).

HIchapụ NA dst

Ee e, n'ezie ị nwere ike nweta naanị site na ọrụ abụọ:

  • wepu (DELETE) ihe niile n'ozuzu
  • fanye niile si ọhụrụ image

Ma n'otu oge ahụ, ekele MVCC. Ogo nke tebụl ga-abawanye kpọmkwem ugboro abụọ! Inweta onyonyo +1M nke ndekọ na tebụl n'ihi mmelite 10K bụ nke enweghị ọrụ…

TRUNCATE dst

Onye nrụpụta nwere ahụmahụ maara na enwere ike ihicha mbadamba ụrọ niile dị ọnụ ala:

  • iji kpochapụ (TRUNCATE) okpokoro dum
  • fanye niile si ọhụrụ image

Usoro dị irè, mgbe ụfọdụ ọdabara, ma enwere nsogbu ... Anyị ga-agbakwunye ndekọ 1M ogologo oge, n'ihi ya, anyị enweghị ike ịhapụ tebụl n'efu maka oge a niile (dị ka ọ ga-eme na-enweghị ọbọp ya na otu azụmahịa).

Nke pụtara:

  • anyị na-amalite azụmahịa ogologo oge
  • TRUNCATE na-amanye Nweta Exclusive- igbochi
  • anyị na-eme ntinye ruo ogologo oge, na onye ọ bụla ọzọ n'oge a enweghị ike ọbụna SELECT

Ọ dịghị ihe na-aga nke ọma...

ỤWA TABLE… Kpọgharia aha… / dobe tebụl…

Nhọrọ ọzọ bụ imeju ihe niile n'ime tebụl ọhụrụ dị iche, wee gbanwee aha ya n'ọnọdụ nke ochie. Ihe ole na ole dị mkpa:

  • ka kwa Nweta Exclusive, n'agbanyeghị obere oge
  • Atụgharịrị atụmatụ/ọnụọgụ ajụjụ niile maka tebụl a, mkpa na-agba ọsọ ANALYZE
  • agbajiri igodo mba ofesi niile (FK) gaa na tebụl

Enwere patch WIP sitere na Simon Riggs nke tụrụ aro ime ALTER-arụ ọrụ iji dochie okpokoro okpokoro na ọkwa faịlụ, na-emetụghị ọnụ ọgụgụ na FK aka, mana ọ naghị anakọta quorum.

Hichapụ, emelite, tinye

Ya mere, anyị na-edozi na nhọrọ na-adịghị egbochi ọrụ atọ. Ihe fọrọ nke nta ka ọ bụrụ atọ ... Kedu otu esi eme nke a nke ọma?

-- все делаем в рамках транзакции, чтобы никто не видел "промежуточных" состояний
BEGIN;

-- создаем временную таблицу с импортируемыми данными
CREATE TEMPORARY TABLE tmp(
  LIKE dst INCLUDING INDEXES -- по образу и подобию, вместе с индексами
) ON COMMIT DROP; -- за рамками транзакции она нам не нужна

-- быстро-быстро вливаем новый образ через COPY
COPY tmp FROM STDIN;
-- ...
-- .

-- удаляем отсутствующие
DELETE FROM
  dst D
USING
  dst X
LEFT JOIN
  tmp Y
    USING(pk1, pk2) -- поля первичного ключа
WHERE
  (D.pk1, D.pk2) = (X.pk1, X.pk2) AND
  Y IS NOT DISTINCT FROM NULL; -- "антиджойн"

-- обновляем оставшиеся
UPDATE
  dst D
SET
  (f1, f2, f3) = (T.f1, T.f2, T.f3)
FROM
  tmp T
WHERE
  (D.pk1, D.pk2) = (T.pk1, T.pk2) AND
  (D.f1, D.f2, D.f3) IS DISTINCT FROM (T.f1, T.f2, T.f3); -- незачем обновлять совпадающие

-- вставляем отсутствующие
INSERT INTO
  dst
SELECT
  T.*
FROM
  tmp T
LEFT JOIN
  dst D
    USING(pk1, pk2)
WHERE
  D IS NOT DISTINCT FROM NULL;

COMMIT;

3.2. Bubata nhazi nhazi

N'otu KLADR ahụ, ndekọ niile gbanwere ga-emerịrị site na nhazi nhazi - emezigharị ya, kọwapụta mkpụrụokwu ma wedata ya na nhazi achọrọ. Mana kedu ka ị si mara - ihe gbanwere kpọmkwemna-enweghị mgbagwoju anya koodu mmekọrịta, na-enweghị imetụ ya aka ma ọlị?

Ọ bụrụ naanị usoro gị nwere ohere ide n'oge mmekọrịta, mgbe ahụ ị nwere ike iji ihe mkpalite ga-anakọta anyị mgbanwe niile:

-- целевые таблицы
CREATE TABLE kladr(...);
CREATE TABLE kladr_house(...);

-- таблицы с историей изменений
CREATE TABLE kladr$log(
  ro kladr, -- тут лежат целые образы записей старой/новой
  rn kladr
);

CREATE TABLE kladr_house$log(
  ro kladr_house,
  rn kladr_house
);

-- общая функция логирования изменений
CREATE OR REPLACE FUNCTION diff$log() RETURNS trigger AS $$
DECLARE
  dst varchar = TG_TABLE_NAME || '$log';
  stmt text = '';
BEGIN
  -- проверяем необходимость логгирования при обновлении записи
  IF TG_OP = 'UPDATE' THEN
    IF NEW IS NOT DISTINCT FROM OLD THEN
      RETURN NEW;
    END IF;
  END IF;
  -- создаем запись лога
  stmt = 'INSERT INTO ' || dst::text || '(ro,rn)VALUES(';
  CASE TG_OP
    WHEN 'INSERT' THEN
      EXECUTE stmt || 'NULL,$1)' USING NEW;
    WHEN 'UPDATE' THEN
      EXECUTE stmt || '$1,$2)' USING OLD, NEW;
    WHEN 'DELETE' THEN
      EXECUTE stmt || '$1,NULL)' USING OLD;
  END CASE;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Ugbu a, anyị nwere ike itinye ihe na-akpalite tupu ịmalite mmekọrịta (ma ọ bụ mee ka ha nwee ike site na ALTER TABLE ... ENABLE TRIGGER ...):

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr_house
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

Mgbe ahụ, anyị na-eji nwayọọ wepụta mgbanwe niile anyị chọrọ na tebụl log ma mee ha site na ndị ọrụ ndị ọzọ.

3.3. Na-ebubata ihe ejikọrọ

N'elu, anyị tụlere okwu mgbe usoro data nke isi mmalite na ebe a na-aga bụ otu. Ma gịnị ma ọ bụrụ na nbudata site na usoro mpụga nwere usoro dị iche na nhazi nchekwa na nchekwa data anyị?

Ka anyị were dịka ọmụmaatụ nchekwa nke ndị ahịa na akaụntụ ha, nhọrọ "ọtụtụ-na-otu" kpochapụwo:

CREATE TABLE client(
  client_id
    serial
      PRIMARY KEY
, inn
    varchar
      UNIQUE
, name
    varchar
);

CREATE TABLE invoice(
  invoice_id
    serial
      PRIMARY KEY
, client_id
    integer
      REFERENCES client(client_id)
, number
    varchar
, dt
    date
, sum
    numeric(32,2)
);

Mana nbudata sitere na isi mmalite na-abịakwute anyị n'ụdị "niile n'otu":

CREATE TEMPORARY TABLE invoice_import(
  client_inn
    varchar
, client_name
    varchar
, invoice_number
    varchar
, invoice_dt
    date
, invoice_sum
    numeric(32,2)
);

N'ụzọ doro anya, enwere ike ịmegharị data ndị ahịa na ụdị a, na ndekọ bụ isi bụ "akaụntụ":

0123456789;Вася;A-01;2020-03-16;1000.00
9876543210;Петя;A-02;2020-03-16;666.00
0123456789;Вася;B-03;2020-03-16;9999.00

Maka ihe nlereanya, anyị ga-etinye naanị data ule anyị, mana cheta - COPY ọzọ ịrụ ọrụ nke ọma!

INSERT INTO invoice_import
VALUES
  ('0123456789', 'Вася', 'A-01', '2020-03-16', 1000.00)
, ('9876543210', 'Петя', 'A-02', '2020-03-16', 666.00)
, ('0123456789', 'Вася', 'B-03', '2020-03-16', 9999.00);

Nke mbụ, ka anyị kọwapụta “mkpụkpọ” ndị “eziokwu” anyị na-ezo aka na ya. N'ọnọdụ anyị, akwụkwọ ọnụahịa na-ezo aka ndị ahịa:

CREATE TEMPORARY TABLE client_import AS
SELECT DISTINCT ON(client_inn)
-- можно просто SELECT DISTINCT, если данные заведомо непротиворечивы
  client_inn inn
, client_name "name"
FROM
  invoice_import;

Ka anyị wee jikọta akaụntụ na ID ndị ahịa, anyị kwesịrị ibu ụzọ chọpụta ma ọ bụ mepụta njirimara ndị a. Ka anyị tinye ubi n'okpuru ha:

ALTER TABLE invoice_import ADD COLUMN client_id integer;
ALTER TABLE client_import ADD COLUMN client_id integer;

Ka anyị jiri usoro mmekọrịta okpokoro akọwara n'elu site na iji obere mmezi - anyị agaghị emelite ma ọ bụ hichapụ ihe ọ bụla na tebụl ebumnuche, n'ihi na anyị na-ebubata ndị ahịa "naanị ihe mgbakwunye":

-- проставляем в таблице импорта ID уже существующих записей
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  client D
WHERE
  T.inn = D.inn; -- unique key

-- вставляем отсутствовавшие записи и проставляем их ID
WITH ins AS (
  INSERT INTO client(
    inn
  , name
  )
  SELECT
    inn
  , name
  FROM
    client_import
  WHERE
    client_id IS NULL -- если ID не проставился
  RETURNING *
)
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  ins D
WHERE
  T.inn = D.inn; -- unique key

-- проставляем ID клиентов у записей счетов
UPDATE
  invoice_import T
SET
  client_id = D.client_id
FROM
  client_import D
WHERE
  T.client_inn = D.inn; -- прикладной ключ

N'ezie, ihe niile dị invoice_import Ugbu a anyị nwere mpaghara kọntaktị jupụtara client_id, nke anyị ga-eji tinye akwụkwọ ọnụahịa ahụ.

isi: www.habr.com

Tinye a comment