Uma i-VACUUM ihluleka, sihlanza itafula mathupha

I-VACUUM angakwazi "ukuhlanza" etafuleni ku-PostgreSQL kuphela lokho akekho ongabona - okungukuthi, asikho nesisodwa isicelo esisebenzayo esiqale ngaphambi kokuthi lawa marekhodi ashintshwe.

Kodwa kuthiwani uma uhlobo olunjalo olungajabulisi (umthwalo wesikhathi eside we-OLAP kusizindalwazi se-OLTP) lusekhona? Kanjani itafula elihlanzekile elishintshayo uzungezwe imibuzo emide futhi unganyatheli erekeni?

Uma i-VACUUM ihluleka, sihlanza itafula mathupha

Ukubeka i-rake

Okokuqala, ake sithole ukuthi iyiphi inkinga esifuna ukuyixazulula nokuthi ingavela kanjani.

Ngokuvamile lesi simo siyenzeka etafuleni elincane uma kuqhathaniswa, kodwa lapho kwenzeka khona izinguquko eziningi. Ngokuvamile lokhu noma okuhlukile amamitha/ama-aggregate/izilinganiso, okuvamise ukwenziwa kuyo okuthi UPDATE, noma ulayini we-buffer ukucubungula uchungechunge oluqhubekayo lwemicimbi, amarekhodi akhona AYIFAKA/SUSA njalo.

Ake sizame ukukhiqiza kabusha inketho ngezilinganiso:

CREATE TABLE tbl(k text PRIMARY KEY, v integer);
CREATE INDEX ON tbl(v DESC); -- по этому индексу будем строить рейтинг

INSERT INTO
  tbl
SELECT
  chr(ascii('a'::text) + i) k
, 0 v
FROM
  generate_series(0, 25) i;

Futhi ngokuhambisana, kokunye ukuxhumana, isicelo eside, eside siqala, siqoqa izibalo eziyinkimbinkimbi, kodwa engathinti itafula lethu:

SELECT pg_sleep(10000);

Manje sibuyekeza inani lezinto zokubala izikhathi eziningi, eziningi. Ngobumsulwa bokuhlolwa, masenze lokhu ngokuthengiselana okuhlukene usebenzisa i-dblinkukuthi kuzokwenzeka kanjani eqinisweni:

DO $$
DECLARE
  i integer;
  tsb timestamp;
  tse timestamp;
  d double precision;
BEGIN
  PERFORM dblink_connect('dbname=' || current_database() || ' port=' || current_setting('port'));
  FOR i IN 1..10000 LOOP
    tsb = clock_timestamp();
    PERFORM dblink($e$UPDATE tbl SET v = v + 1 WHERE k = 'a';$e$);
    tse = clock_timestamp();
    IF i % 1000 = 0 THEN
      d = (extract('epoch' from tse) - extract('epoch' from tsb)) * 1000;
      RAISE NOTICE 'i = %, exectime = %', lpad(i::text, 5), lpad(d::text, 5);
    END IF;
  END LOOP;
  PERFORM dblink_disconnect();
END;
$$ LANGUAGE plpgsql;

NOTICE:  i =  1000, exectime = 0.524
NOTICE:  i =  2000, exectime = 0.739
NOTICE:  i =  3000, exectime = 1.188
NOTICE:  i =  4000, exectime = 2.508
NOTICE:  i =  5000, exectime = 1.791
NOTICE:  i =  6000, exectime = 2.658
NOTICE:  i =  7000, exectime = 2.318
NOTICE:  i =  8000, exectime = 2.572
NOTICE:  i =  9000, exectime = 2.929
NOTICE:  i = 10000, exectime = 3.808

Kwenzenjani? Kungani ngisho nange-UPDATE elula yerekhodi elilodwa isikhathi sokwenza sehliswe izikhathi eziyisi-7 - kusuka ku-0.524ms kuya ku-3.808ms? Futhi ukulinganisa kwethu kukhula kancane kancane.

Konke kuyiphutha le-MVCC.

Konke kumayelana Indlela ye-MVCC, okubangela ukuthi umbuzo ubheke kuzo zonke izinguqulo zangaphambilini zokufaka. Ngakho-ke masihlanze ithebula lethu ezinguqulweni “ezifile”:

VACUUM VERBOSE tbl;

INFO:  vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 10026 nonremovable row versions in 45 out of 45 pages
DETAIL:  10000 dead row versions cannot be removed yet, oldest xmin: 597439602

O, akukho lutho lokuhlanza! Ukuhambisana Isicelo esisebenzayo siyasiphazamisa - ngemva kwakho konke, ngolunye usuku angase afune ukuphendukela kulezi zinguqulo (kuthiwani uma?), Futhi kufanele zitholakale kuye. Ngakho-ke ngisho ne-VACUUM FULL ngeke isisize.

“Ibhidliza” itafula

Kodwa siyazi ngokuqinisekile ukuthi lowo mbuzo awudingi ithebula lethu. Ngakho-ke, sisazozama ukubuyisela ukusebenza kwesistimu emikhawulweni eyanele ngokususa yonke into engadingekile etafuleni - okungenani “ngesandla”, njengoba i-VACUUM ivuma.

Ukuze sikwenze kucace kakhudlwana, ake sibheke isibonelo secala letafula lebhafa. Okusho ukuthi, kunokugeleza okukhulu kwe-INSERT/DELETE, futhi ngezinye izikhathi ithebula alinalutho ngokuphelele. Kodwa uma ingenalutho, kufanele londoloza okuqukethwe kwayo kwamanje.

#0: Ukuhlola isimo

Kuyacaca ukuthi ungazama ukwenza okuthile ngetafula ngisho nangemva kokusebenza ngakunye, kodwa lokhu akuwenzi umqondo omkhulu - i-overhead yokunakekela izoba nkulu ngokucacile kunomphumela wemibuzo eqondiwe.

Ake sakhe imibandela - "sekuyisikhathi sokuthatha isinyathelo" uma:

  • I-VACUUM yethulwa kudala kakhulu
    Silindele umthwalo osindayo, ngakho makube njalo Imizuzwana engu-60 kusukela [okuzenzakalelayo]VACUUM yokugcina.
  • usayizi wethebula elibonakalayo mkhulu kunethagethi
    Masiyichaze njengephindwe kabili inombolo yamakhasi (amabhulokhi angu-8KB) uma kuqhathaniswa nosayizi omncane - 1 blk ngenqwaba + 1 blk kunkomba ngayinye - ngetafula okungenzeka alinalutho. Uma silindele ukuthi inani elithile ledatha lizohlala lihlala kusigcinalwazi “ngokujwayelekile”, kunengqondo ukulungisa le fomula.

Isicelo sokuqinisekisa

SELECT
  relpages
, ((
    SELECT
      count(*)
    FROM
      pg_index
    WHERE
      indrelid = cl.oid
  ) + 1) << 13 size_norm -- тут правильнее делать * current_setting('block_size')::bigint, но кто меняет размер блока?..
, pg_total_relation_size(oid) size
, coalesce(extract('epoch' from (now() - greatest(
    pg_stat_get_last_vacuum_time(oid)
  , pg_stat_get_last_autovacuum_time(oid)
  ))), 1 << 30) vaclag
FROM
  pg_class cl
WHERE
  oid = $1::regclass -- tbl
LIMIT 1;

relpages | size_norm | size    | vaclag
-------------------------------------------
       0 |     24576 | 1105920 | 3392.484835

#1: Kuseyi-VACUUM

Ngeke sazi kusenesikhathi ukuthi ingabe umbuzo ofanayo uyasiphazamisa kakhulu - mangaki amarekhodi “aphelelwe yisikhathi” selokhu aqala. Ngakho-ke, lapho sinquma ukucubungula ithebula ngandlela-thile, kunoma yikuphi, kufanele siqale sikhiphe kulo I-VACUUM - ngokungafani ne-VACUUM FULL, ayiphazamisi izinqubo ezifanayo ezisebenza nedatha yokufunda-bhala.

Ngesikhathi esifanayo, ingakwazi ukuhlanza ngokushesha okuningi kwalokho esingathanda ukukususa. Yebo, futhi imibuzo elandelayo kuleli thebula izoya kithi ngokuthi "inqolobane eshisayo", okuzonciphisa ubude bawo - futhi, ngakho-ke, isikhathi esiphelele sokuvimbela abanye ngomsebenzi wethu wokunikeza isevisi.

#2: Ingabe ukhona ekhaya?

Ake sihlole ukuthi kukhona yini okuthile etafuleni:

TABLE tbl LIMIT 1;

Uma lingekho irekhodi elilodwa elisele, singalondoloza okuningi ekucubunguleni ngokwenza nje QINISA:

Isebenza ngendlela efanayo nomyalo we-DELETE ongenamibandela wetafula ngalinye, kodwa ishesha kakhulu njengoba empeleni ayiskeneli amathebula. Ngaphezu kwalokho, ikhulula ngokushesha isikhala sediski, ngakho-ke asikho isidingo sokwenza umsebenzi we-VACUUM kamuva.

Ukuthi udinga ukusetha kabusha isibali sokulandelana kwethebula (QALA KABUSHA IDENTITY) kukuwe ukuthi unqume.

#3: Wonke umuntu - shintshana!

Njengoba sisebenza endaweni enokuncintisana okukhulu, ngenkathi silapha sibheka ukuthi akukho okufakiwe etafuleni, kungenzeka ukuthi kukhona osekubhalile lapho. Akufanele silahlekelwe yilolu lwazi, pho? Kunjalo, sidinga ukwenza isiqiniseko sokuthi akekho ongabhala phansi ngokuqinisekile.

Ukuze senze lokhu sidinga ukunika amandla SERIALIZABLE-ukuhlukaniswa kokuthengiselana kwethu (yebo, lapha siqala ukuthengiselana) futhi sikhiye itafula "kancane":

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
LOCK TABLE tbl IN ACCESS EXCLUSIVE MODE;

Leli zinga lokuvimba linqunywa imisebenzi esifuna ukuyenza kuyo.

#4: Ukungqubuzana kwezintshisekelo

Sifika lapha futhi sifuna "ukukhiya" isibonakaliso - kuthiwani uma othile esebenza kuso ngaleso sikhathi, isibonelo, efunda kuso? "Sizolenga" silinde ukuthi lesi siqephu sikhishwe, futhi abanye abafuna ukufunda bazogijima kithi ...

Ukuvimbela lokhu ukuthi kungenzeki, "sizozidela" - uma asikwazanga ukuthola ukukhiya ngesikhathi esithile (esifushane esamukelekayo), khona-ke sizothola okuhlukile esisekelweni, kodwa okungenani ngeke siphazamise kakhulu abanye.

Ukuze wenze lokhu, setha i-variable yeseshini lock_timeout (ngezinguqulo 9.3+) noma/kanye statement_timeout. Into esemqoka okufanele uyikhumbule ukuthi inani le-statement_timeout lisebenza kuphela esitatimendeni esilandelayo. Okusho ukuthi, njengalokhu ku-gluing - ngeke isebenze:

SET statement_timeout = ...;LOCK TABLE ...;

Ukuze singadingi ukubhekana nokubuyisela inani “elidala” lokuguquguquka kamuva, sisebenzisa ifomu ISETHA ENDAWENI, ekhawulela ububanzi besilungiselelo kumsebenzi wamanje.

Sikhumbula ukuthi i-statement_timeout isebenza kuzo zonke izicelo ezilandelayo ukuze umsebenzi unganwebeki ufike kumanani angamukeleki uma kunedatha eningi kuthebula.

#5: Kopisha idatha

Uma ithebula lingenalutho ngokuphelele, idatha kuzodingeka iphinde ilondolozwe kusetshenziswa ithebula lesikhashana elisizayo:

CREATE TEMPORARY TABLE _tmp_swap ON COMMIT DROP AS TABLE tbl;

Isiginesha NGENXA YOKUZINIKELA kusho ukuthi ngesikhathi umsebenzi uphela, ithebula lesikhashana lizoyeka ukuba khona, futhi asikho isidingo sokulisusa mathupha kumongo wokuxhumana.

Njengoba sicabanga ukuthi ayikho idatha "ebukhoma" eningi, lokhu kusebenza kufanele kwenzeke ngokushesha okukhulu.

Awu, yilokho kuphela! Ungakhohlwa ngemva kokuqeda umsebenzi gijima HLAZIYA ukwenza izibalo zetafula zibe ngokwejwayelekile uma kunesidingo.

Ukuhlanganisa umbhalo wokugcina

Sisebenzisa le "pseudo-python":

# собираем статистику с таблицы
stat <-
  SELECT
    relpages
  , ((
      SELECT
        count(*)
      FROM
        pg_index
      WHERE
        indrelid = cl.oid
    ) + 1) << 13 size_norm
  , pg_total_relation_size(oid) size
  , coalesce(extract('epoch' from (now() - greatest(
      pg_stat_get_last_vacuum_time(oid)
    , pg_stat_get_last_autovacuum_time(oid)
    ))), 1 << 30) vaclag
  FROM
    pg_class cl
  WHERE
    oid = $1::regclass -- table_name
  LIMIT 1;

# таблица больше целевого размера и VACUUM был давно
if stat.size > 2 * stat.size_norm and stat.vaclag is None or stat.vaclag > 60:
  -> VACUUM %table;
  try:
    -> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
    # пытаемся захватить монопольную блокировку с предельным временем ожидания 1s
    -> SET LOCAL statement_timeout = '1s'; SET LOCAL lock_timeout = '1s';
    -> LOCK TABLE %table IN ACCESS EXCLUSIVE MODE;
    # надо убедиться в пустоте таблицы внутри транзакции с блокировкой
    row <- TABLE %table LIMIT 1;
    # если в таблице нет ни одной "живой" записи - очищаем ее полностью, в противном случае - "перевставляем" все записи через временную таблицу
    if row is None:
      -> TRUNCATE TABLE %table RESTART IDENTITY;
    else:
      # создаем временную таблицу с данными таблицы-оригинала
      -> CREATE TEMPORARY TABLE _tmp_swap ON COMMIT DROP AS TABLE %table;
      # очищаем оригинал без сброса последовательности
      -> TRUNCATE TABLE %table;
      # вставляем все сохраненные во временной таблице данные обратно
      -> INSERT INTO %table TABLE _tmp_swap;
    -> COMMIT;
  except Exception as e:
    # если мы получили ошибку, но соединение все еще "живо" - словили таймаут
    if not isinstance(e, InterfaceError):
      -> ROLLBACK;

Ingabe kungenzeka ukuthi ungakopishi idatha okwesibili?Empeleni, kungenzeka uma i-oid yetafula ngokwayo ingaboshelwe kunoma yimiphi eminye imisebenzi evela ohlangothini lwe-BL noma i-FK ohlangothini lwe-DB:

CREATE TABLE _swap_%table(LIKE %table INCLUDING ALL);
INSERT INTO _swap_%table TABLE %table;
DROP TABLE %table;
ALTER TABLE _swap_%table RENAME TO %table;

Masiqalise iskripthi kuthebula lomthombo futhi sihlole amamethrikhi:

VACUUM tbl;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  SET LOCAL statement_timeout = '1s'; SET LOCAL lock_timeout = '1s';
  LOCK TABLE tbl IN ACCESS EXCLUSIVE MODE;
  CREATE TEMPORARY TABLE _tmp_swap ON COMMIT DROP AS TABLE tbl;
  TRUNCATE TABLE tbl;
  INSERT INTO tbl TABLE _tmp_swap;
COMMIT;

relpages | size_norm | size   | vaclag
-------------------------------------------
       0 |     24576 |  49152 | 32.705771

Konke kwahamba kahle! Ithebula linciphe izikhathi ezingu-50 futhi zonke UPDATE zisebenza ngokushesha futhi.

Source: www.habr.com

Engeza amazwana