DBA: iya tsara aiki tare da shigo da kaya

Don hadaddun sarrafa manyan bayanan bayanai (daban Hanyoyin ciniki na ETL: shigo da, juyawa da aiki tare da tushen waje) sau da yawa akwai buƙata na ɗan lokaci "tuna" kuma nan da nan aiwatar da sauri wani abu mai girma.

Aiki na yau da kullun na irin wannan yana yin sauti kamar haka: "Dama nan an sauke sashen lissafin kudi daga bankin abokin ciniki biya na ƙarshe da aka karɓa, kuna buƙatar hanzarta loda su zuwa gidan yanar gizon kuma ku haɗa su zuwa asusunku.”

Amma lokacin da ƙarar wannan "wani abu" ya fara aunawa a cikin daruruwan megabyte, kuma sabis ɗin dole ne ya ci gaba da aiki tare da bayanan 24x7, yawancin sakamako masu illa sun tashi wanda zai lalata rayuwar ku.
DBA: iya tsara aiki tare da shigo da kaya
Don magance su a cikin PostgreSQL (kuma ba kawai a ciki ba), zaku iya amfani da wasu haɓakawa waɗanda zasu ba ku damar aiwatar da komai cikin sauri kuma tare da ƙarancin amfani da albarkatu.

1. A ina ake jigilar kaya?

Da farko, bari mu yanke shawarar inda za mu iya loda bayanan da muke son aiwatarwa.

1.1. Tebura na wucin gadi (TEMPORARY TEBLE)

A ka'ida, don tebur na wucin gadi na PostgreSQL iri ɗaya ne da kowane. Saboda haka, camfi kamar "Duk abin da ke wurin ana adana shi ne kawai a cikin ƙwaƙwalwar ajiya, kuma yana iya ƙarewa". Amma akwai kuma bambance-bambance masu mahimmanci.

Naku “spacespace” na kowane haɗin kai zuwa bayanan bayanai

Idan haɗi biyu gwada haɗawa a lokaci guda CREATE TABLE x, to babu shakka wani zai samu kuskuren rashin daidaituwa bayanai abubuwa.

Amma idan duka biyu suna ƙoƙarin aiwatarwa CREATE TEMPORARY TABLE x, to duka biyu za su yi shi kullum, kuma kowa zai samu kwafin ku teburi. Kuma babu wani abu da zai zama gama gari a tsakaninsu.

"Lalata kai" lokacin cire haɗin

Lokacin da haɗin ke rufe, duk tebur na wucin gadi ana share su ta atomatik, don haka da hannu DROP TABLE x babu fa'ida sai...

Idan kuna aiki ta hanyar pgbouncer a cikin yanayin ciniki, sa'an nan kuma bayanan ya ci gaba da gaskata cewa wannan haɗin yana aiki har yanzu, kuma a ciki har yanzu wannan tebur na wucin gadi yana wanzu.

Saboda haka, ƙoƙarin sake ƙirƙira shi, daga haɗin daban zuwa pgbouncer, zai haifar da kuskure. Amma wannan za a iya kauce masa ta hanyar amfani CREATE TEMPORARY TABLE IF NOT EXISTS x.

Gaskiya ne, ya fi kyau kada ku yi haka ta wata hanya, domin a nan za ku iya "kwatsam" ku sami bayanan da suka rage daga "mai shi na baya". Maimakon haka, yana da kyau a karanta littafin jagora kuma ku ga cewa lokacin ƙirƙirar tebur yana yiwuwa a ƙara ON COMMIT DROP - wato, lokacin da ciniki ya ƙare, za a goge tebur ta atomatik.

Rashin maimaitawa

Saboda suna cikin takamaiman haɗin gwiwa ne kawai, ba a yin kwafin tebur na wucin gadi. Amma wannan yana kawar da buƙatar rikodin bayanai sau biyu a cikin tudu + WAL, don haka SA / UPDATE / DELETE a ciki yana da sauri sosai.

Amma tun da tebur na wucin gadi har yanzu tebur “kusan talakawa” ne, ba za a iya ƙirƙira shi akan kwafi ko ɗaya ba. Aƙalla a yanzu, kodayake facin da ya dace ya daɗe yana yawo.

1.2. TEBLAR DA BA A SAMU BA

Amma menene ya kamata ku yi, alal misali, idan kuna da wasu nau'ikan tsarin ETL masu wahala waɗanda ba za a iya aiwatar da su a cikin ma'amala ɗaya ba, amma har yanzu kuna da. pgbouncer a cikin yanayin ciniki? ..

Ko kuma kwararar bayanai sun yi yawa sosai Babu isasshen bandwidth akan haɗi ɗaya daga rumbun adana bayanai (karanta, tsari daya akan CPU)?...

Ko kuma ana gudanar da wasu ayyuka asynchronously a daban-daban alaka?..

Akwai zaɓi ɗaya kawai a nan - ƙirƙira tebur na ɗan lokaci ba na ɗan lokaci ba. Pun, da. Wato:

  • ƙirƙira tebur na "nawa" tare da sunaye bazuwar ƙima don kada in yi cudanya da kowa
  • tsantsa: cike su da bayanai daga wani waje
  • Canji: tuba, cike da maɓalli masu haɗawa
  • load: zuba shirye-shiryen bayanai a cikin tebur masu niyya
  • share tebur "na".

Kuma yanzu - gardama a cikin maganin shafawa. A hakika, duk abin da aka rubuta a PostgreSQL ya faru sau biyu - farko a WAL, sa'an nan a cikin tebur / jikunan index. Ana yin duk wannan don tallafawa ACID da kuma daidaita ganuwa data tsakanin COMMIT'nutty kuma ROLLBACK'null ma'amaloli.

Amma ba ma buƙatar wannan! Muna da dukan tsari Ko dai an yi nasara gaba daya ko kuma a'a.. Ba kome nawa matsakaiciyar ma'amaloli za a yi - ba mu da sha'awar "ci gaba da tsari daga tsakiya," musamman ma lokacin da ba a bayyana inda yake ba.

Don yin wannan, masu haɓaka PostgreSQL, baya cikin sigar 9.1, sun gabatar da irin wannan abu kamar UNLOGGED Tables:

Tare da wannan alamar, ana ƙirƙira teburin azaman wanda ba a buɗe ba. Bayanan da aka rubuta zuwa teburin da ba a shigar da su ba ba sa shiga cikin rubutun gaba (duba Babi na 29), yana haifar da irin wannan tebur. aiki da sauri fiye da yadda aka saba. Duk da haka, ba su da kariya daga gazawa; idan akwai gazawar uwar garken ko rufewar gaggawa, tebur da ba a buɗe ba ta atomatik. Ƙari ga haka, abubuwan da ke cikin teburin da ba a buɗe ba ba a kwaikwaya ba zuwa sabobin bayi. Duk wani fihirisar da aka ƙirƙira akan teburin da ba a buɗe ba za a buɗe ta atomatik.

A takaice dai, zai yi sauri da sauri, amma idan uwar garken bayanan “ta faɗi”, zai zama mara daɗi. Amma sau nawa hakan ke faruwa, kuma shin tsarin ETL ɗin ku ya san yadda ake gyara wannan daidai "daga tsakiya" bayan "farfaɗo" bayanan?

Idan ba haka ba, kuma yanayin da ke sama yayi kama da naku, yi amfani UNLOGGEDamma ba kar a kunna wannan sifa akan tebur na ainihi, bayanan daga abin da kuke so.

1.3. AKAN ALADA { Goge layuka | RUBUTU}

Wannan ginin yana ba ku damar tantance halayen atomatik lokacin da aka gama ciniki lokacin ƙirƙirar tebur.

a kan ON COMMIT DROP Na riga na rubuta a sama, yana haifarwa DROP TABLE, amma tare da ON COMMIT DELETE ROWS yanayin ya fi ban sha'awa - an haifar da shi a nan TRUNCATE TABLE.

Tunda duk abubuwan da ake buƙata don adana meta-bayanin tebur na wucin gadi daidai yake da na tebur na yau da kullun, sannan Ƙirƙiri na yau da kullum da kuma share tebur na wucin gadi yana haifar da "ƙumburi" na tsarin tsarin pg_class, pg_attribute, pg_atrdef, pg_dogara,…

Yanzu yi tunanin cewa kuna da ma'aikaci akan haɗin kai tsaye zuwa bayanan bayanai, wanda ke buɗe sabon ma'amala kowane daƙiƙa, ƙirƙira, cikawa, aiwatarwa da share tebur na wucin gadi ... Za a sami raguwar datti da aka tara a cikin tebur na tsarin, kuma wannan zai haifar da ƙarin birki ga kowane aiki.

Gabaɗaya, kada ku yi wannan! A wannan yanayin ya fi tasiri CREATE TEMPORARY TABLE x ... ON COMMIT DELETE ROWS fitar da shi daga sake zagayowar ma'amala - sannan ta farkon kowace sabuwar ma'amala da allunan sun rigaya zai wanzu (Ajiye kira CREATE), amma zai zama fanko, godiya ga TRUNCATE (mun kuma ajiye kiransa) lokacin kammala cinikin da ya gabata.

1.4. KAMAR...HADA...

Na ambata a farkon cewa ɗaya daga cikin abubuwan da aka saba amfani da su don tebur na wucin gadi shine nau'ikan shigo da kaya iri-iri - kuma mai haɓakawa ya gaji ya kwafa jerin filayen tebur ɗin da aka yi niyya a cikin sanarwar wucin gadi ...

Amma kasala ita ce injin ci gaba! Shi ya sa ƙirƙirar sabon tebur "bisa samfurin" zai iya zama mafi sauƙi:

CREATE TEMPORARY TABLE import_table(
  LIKE target_table
);

Tun da za ku iya samar da bayanai da yawa a cikin wannan tebur, bincika ta hanyar ba zai taɓa yin sauri ba. Amma akwai maganin gargajiya ga wannan - alamomi! Kuma, iya, tebur na wucin gadi kuma yana iya samun fihirisa.

Tun da, sau da yawa, fihirisar da ake buƙata sun zo daidai da ma'auni na tebur na manufa, kawai za ku iya rubutawa LIKE target_table INCLUDING INDEXES.

Idan kuma kuna bukata DEFAULT-values ​​(misali, don cika ƙimar maɓalli na farko), zaku iya amfani da su LIKE target_table INCLUDING DEFAULTS. Ko kuma a sauƙaƙe - LIKE target_table INCLUDING ALL - kwafi abubuwan da ba a so, fihirisa, takurawa,...

Amma a nan kuna buƙatar fahimtar cewa idan kun ƙirƙira shigo da tebur nan da nan tare da fihirisa, sa'an nan bayanai za su dauki tsawon lokaci don lodawafiye da idan kun fara cika komai, sannan kawai ku mirgine fihirisar - duba yadda yake yin wannan a matsayin misali pg_zuba.

A takaice, RTFM!

2. Yadda ake rubutu?

Bari in ce kawai - amfani da shi COPY- kwarara a maimakon "fakiti" INSERT, hanzari a wasu lokuta. Kuna iya har ma kai tsaye daga fayil ɗin da aka riga aka ƙirƙira.

3. Yadda ake aiwatarwa?

Don haka, bari mu gabatar da gabatarwarmu ta yi kama da haka:

  • kuna da tebur tare da bayanan abokin ciniki da aka adana a cikin bayananku 1M rubuce-rubuce
  • kowace rana abokin ciniki yana aiko muku da sabo cikakken "hoton"
  • daga kwarewa ka san cewa lokaci zuwa lokaci ba a canza bayanan fiye da 10K ba

Misalin al'ada na irin wannan yanayin shine KLADR tushe - akwai adireshi da yawa gabaɗaya, amma a cikin kowane lodawa na mako-mako akwai canje-canje kaɗan kaɗan (sake suna na ƙauyuka, haɗa tituna, bayyanar sabbin gidaje) har ma da ma'aunin ƙasa.

3.1. Cikakken algorithm aiki tare

Don sauƙi, bari mu ce ba ma buƙatar sake fasalin bayanai ba - kawai kawo tebur a cikin hanyar da ake so, wato:

  • cire duk abin da ba ya wanzu
  • sabuntawa duk abin da ya riga ya wanzu kuma yana buƙatar sabuntawa
  • saka duk abin da bai faru ba tukuna

Me yasa za a yi ayyukan a cikin wannan tsari? Domin wannan shine yadda girman tebur zai girma kadan (tuna game da MVCC!).

GARE DAGA dst

A'a, ba shakka za ku iya samun ta da ayyuka guda biyu kawai:

  • cire (DELETE) komai gaba daya
  • saka duk daga sabon hoton

Amma a lokaci guda, godiya ga MVCC. Girman tebur zai karu daidai sau biyu! Samun hotuna +1M na rikodin a cikin tebur saboda sabuntawar 10K shine sake sakewa...

TRUNCATE dst

Wani ƙwararren mai haɓakawa ya san cewa ana iya tsabtace kwamfutar gabaɗaya da rahusa:

  • bayyana (TRUNCATE) dukan tebur
  • saka duk daga sabon hoton

Hanyar yana da tasiri, wani lokacin quite m, amma akwai matsala ... Za mu ƙara rikodin 1M na dogon lokaci, don haka ba za mu iya samun damar barin teburin komai ba har tsawon wannan lokacin (kamar yadda zai faru ba tare da rufe shi a cikin ma'amala ɗaya ba).

Wanda ke nufin:

  • muna farawa ma'amala mai tsawo
  • TRUNCATE dorawa Samun shiga na Musamman-tarewa
  • muna yin shigarwa na dogon lokaci, da kowa da kowa a wannan lokacin ba zai iya ko da SELECT

Wani abu baya tafiya da kyau...

MAGANAR TABBA… / SAKE SUNA...

Madadin shine a cika komai cikin sabon tebur daban, sannan kawai a sake suna a madadin tsohon. Wasu ƙananan abubuwa masu banƙyama:

  • har yanzu kuma Samun shiga na Musamman, ko da yake muhimmanci kasa lokaci
  • an sake saita duk tsare-tsaren/kididdigan tambaya na wannan tebur, bukatar gudanar da ANALYZE
  • duk maɓallan ƙasashen waje sun karye (FK) zuwa tebur

Akwai facin WIP daga Simon Riggs wanda ya ba da shawarar yin ALTER-aiki don maye gurbin jikin tebur a matakin fayil, ba tare da taɓa kididdiga da FK ba, amma bai tattara adadin adadin ba.

GAME, KYAUTA, SAKA

Don haka, mun daidaita kan zaɓin da ba tare da toshe ayyuka guda uku ba. Kusan uku... Ta yaya za a yi wannan mafi inganci?

-- все делаем в рамках транзакции, чтобы никто не видел "промежуточных" состояний
BEGIN;

-- создаем временную таблицу с импортируемыми данными
CREATE TEMPORARY TABLE tmp(
  LIKE dst INCLUDING INDEXES -- по образу и подобию, вместе с индексами
) ON COMMIT DROP; -- за рамками транзакции она нам не нужна

-- быстро-быстро вливаем новый образ через COPY
COPY tmp FROM STDIN;
-- ...
-- .

-- удаляем отсутствующие
DELETE FROM
  dst D
USING
  dst X
LEFT JOIN
  tmp Y
    USING(pk1, pk2) -- поля первичного ключа
WHERE
  (D.pk1, D.pk2) = (X.pk1, X.pk2) AND
  Y IS NOT DISTINCT FROM NULL; -- "антиджойн"

-- обновляем оставшиеся
UPDATE
  dst D
SET
  (f1, f2, f3) = (T.f1, T.f2, T.f3)
FROM
  tmp T
WHERE
  (D.pk1, D.pk2) = (T.pk1, T.pk2) AND
  (D.f1, D.f2, D.f3) IS DISTINCT FROM (T.f1, T.f2, T.f3); -- незачем обновлять совпадающие

-- вставляем отсутствующие
INSERT INTO
  dst
SELECT
  T.*
FROM
  tmp T
LEFT JOIN
  dst D
    USING(pk1, pk2)
WHERE
  D IS NOT DISTINCT FROM NULL;

COMMIT;

3.2. Shigo da aiki bayan sarrafawa

A cikin KLADR guda ɗaya, duk bayanan da aka canza dole ne kuma a gudanar da su ta hanyar aiwatarwa - na yau da kullun, ba da haske ga mahimman kalmomi, kuma a rage su zuwa tsarin da ake buƙata. Amma ta yaya kuka sani - me ya canza daidaiba tare da rikitar da lambar daidaitawa ba, da kyau ba tare da taɓa shi kwata-kwata ba?

Idan kawai tsarin ku yana da damar yin rubutu a lokacin aiki tare, to zaku iya amfani da faɗakarwa wanda zai tattara mana duk canje-canje:

-- целевые таблицы
CREATE TABLE kladr(...);
CREATE TABLE kladr_house(...);

-- таблицы с историей изменений
CREATE TABLE kladr$log(
  ro kladr, -- тут лежат целые образы записей старой/новой
  rn kladr
);

CREATE TABLE kladr_house$log(
  ro kladr_house,
  rn kladr_house
);

-- общая функция логирования изменений
CREATE OR REPLACE FUNCTION diff$log() RETURNS trigger AS $$
DECLARE
  dst varchar = TG_TABLE_NAME || '$log';
  stmt text = '';
BEGIN
  -- проверяем необходимость логгирования при обновлении записи
  IF TG_OP = 'UPDATE' THEN
    IF NEW IS NOT DISTINCT FROM OLD THEN
      RETURN NEW;
    END IF;
  END IF;
  -- создаем запись лога
  stmt = 'INSERT INTO ' || dst::text || '(ro,rn)VALUES(';
  CASE TG_OP
    WHEN 'INSERT' THEN
      EXECUTE stmt || 'NULL,$1)' USING NEW;
    WHEN 'UPDATE' THEN
      EXECUTE stmt || '$1,$2)' USING OLD, NEW;
    WHEN 'DELETE' THEN
      EXECUTE stmt || '$1,NULL)' USING OLD;
  END CASE;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Yanzu za mu iya amfani da abubuwan faɗakarwa kafin fara aiki tare (ko kunna su ta hanyar ALTER TABLE ... ENABLE TRIGGER ...):

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

CREATE TRIGGER log
  AFTER INSERT OR UPDATE OR DELETE
  ON kladr_house
    FOR EACH ROW
      EXECUTE PROCEDURE diff$log();

Sannan a hankali za mu fitar da duk canje-canjen da muke buƙata daga tebur ɗin log ɗin kuma mu sarrafa su ta ƙarin masu sarrafa su.

3.3. Ana shigo da Saitunan Haɗi

A sama mun yi la'akari da lokuta lokacin da tsarin bayanan tushen da wurin da aka nufa suka kasance iri ɗaya. Amma menene idan lodawa daga tsarin waje yana da tsari daban da tsarin ajiya a cikin bayanan mu?

Bari mu ɗauki a matsayin misali ajiyar abokan ciniki da asusun su, zaɓin “da yawa-zuwa ɗaya” na gargajiya:

CREATE TABLE client(
  client_id
    serial
      PRIMARY KEY
, inn
    varchar
      UNIQUE
, name
    varchar
);

CREATE TABLE invoice(
  invoice_id
    serial
      PRIMARY KEY
, client_id
    integer
      REFERENCES client(client_id)
, number
    varchar
, dt
    date
, sum
    numeric(32,2)
);

Amma zazzagewar daga tushen waje ta zo mana ta hanyar “duk a ɗaya”:

CREATE TEMPORARY TABLE invoice_import(
  client_inn
    varchar
, client_name
    varchar
, invoice_number
    varchar
, invoice_dt
    date
, invoice_sum
    numeric(32,2)
);

Babu shakka, ana iya kwafin bayanan abokin ciniki a cikin wannan sigar, kuma babban rikodin shine "asusu":

0123456789;Вася;A-01;2020-03-16;1000.00
9876543210;Петя;A-02;2020-03-16;666.00
0123456789;Вася;B-03;2020-03-16;9999.00

Ga samfurin, za mu saka bayanan gwajin mu kawai, amma ku tuna - COPY mafi inganci!

INSERT INTO invoice_import
VALUES
  ('0123456789', 'Вася', 'A-01', '2020-03-16', 1000.00)
, ('9876543210', 'Петя', 'A-02', '2020-03-16', 666.00)
, ('0123456789', 'Вася', 'B-03', '2020-03-16', 9999.00);

Da farko, bari mu haskaka waɗancan “yanke” waɗanda “gaskiya” namu suke nufi. A cikin yanayinmu, daftari suna nufin abokan ciniki:

CREATE TEMPORARY TABLE client_import AS
SELECT DISTINCT ON(client_inn)
-- можно просто SELECT DISTINCT, если данные заведомо непротиворечивы
  client_inn inn
, client_name "name"
FROM
  invoice_import;

Domin yin haɗin kai daidai asusu tare da ID na abokin ciniki, da farko muna buƙatar gano ko samar da waɗannan masu ganowa. Bari mu ƙara filayen ƙarƙashinsu:

ALTER TABLE invoice_import ADD COLUMN client_id integer;
ALTER TABLE client_import ADD COLUMN client_id integer;

Bari mu yi amfani da hanyar daidaita tebur da aka kwatanta a sama tare da ƙaramin gyara - ba za mu sabunta ko share wani abu a cikin teburin da aka yi niyya ba, saboda muna shigo da abokan ciniki “append-kawai”:

-- проставляем в таблице импорта ID уже существующих записей
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  client D
WHERE
  T.inn = D.inn; -- unique key

-- вставляем отсутствовавшие записи и проставляем их ID
WITH ins AS (
  INSERT INTO client(
    inn
  , name
  )
  SELECT
    inn
  , name
  FROM
    client_import
  WHERE
    client_id IS NULL -- если ID не проставился
  RETURNING *
)
UPDATE
  client_import T
SET
  client_id = D.client_id
FROM
  ins D
WHERE
  T.inn = D.inn; -- unique key

-- проставляем ID клиентов у записей счетов
UPDATE
  invoice_import T
SET
  client_id = D.client_id
FROM
  client_import D
WHERE
  T.client_inn = D.inn; -- прикладной ключ

A gaskiya, komai yana ciki invoice_import Yanzu mun cika filin lamba client_id, wanda za mu saka daftari.

source: www.habr.com

Add a comment