Ajiye dinari akan manyan kundin a PostgreSQL

Ci gaba da batun yin rikodin manyan rafukan bayanan da aka tashe ta labarin da ya gabata game da rarrabawa, a cikin wannan za mu dubi hanyoyin da za ku iya rage girman "jiki" na abin da aka adana a cikin PostgreSQL, da tasirin su akan aikin uwar garke.

Zamuyi magana akai Saitunan TOAST da daidaita bayanai. "A matsakaici," waɗannan hanyoyin ba za su adana albarkatu masu yawa ba, amma ba tare da canza lambar aikace-aikacen kwata-kwata ba.

Ajiye dinari akan manyan kundin a PostgreSQL
Koyaya, kwarewarmu ta zama mai fa'ida sosai a wannan batun, tunda adana kusan kowane saka idanu ta yanayinsa shine galibi append-kawai dangane da bayanan da aka rubuta. Kuma idan kuna mamakin yadda za ku koya wa ma'aunin bayanai don rubutawa zuwa faifai maimakon 200MB / s rabin adadin - don Allah a ƙarƙashin cat.

Ƙananan sirrin manyan bayanai

Ta hanyar bayanin aiki hidimarmu, a kai a kai su kan tashi zuwa gare shi daga matsuguni fakitin rubutu.

Kuma tun hadaddun VLSIwanda muke saka idanu akan ma'adanar bayanai shine samfuri mai nau'i-nau'i da yawa tare da tsarin bayanai masu rikitarwa, sannan tambayoyi don iyakar aiki juya sosai kamar haka “multi-volume” tare da hadadden dabaru na algorithmic. Don haka ƙarar kowane misali ɗaya na buƙatu ko sakamakon aiwatar da shirin a cikin log ɗin da ya zo mana ya zama “a matsakaita” babba sosai.

Mu kalli tsarin daya daga cikin allunan da muke rubuta bayanan “dannye” a ciki - wato, ga ainihin rubutun daga wurin shiga:

CREATE TABLE rawdata_orig(
  pack -- PK
    uuid NOT NULL
, recno -- PK
    smallint NOT NULL
, dt -- ключ секции
    date
, data -- самое главное
    text
, PRIMARY KEY(pack, recno)
);

Alamar al'ada (wanda aka riga an raba shi, ba shakka, don haka wannan samfurin sashe ne), inda mafi mahimmancin abu shine rubutu. Wani lokaci quite voluminous.

Ka tuna cewa girman "jiki" na rikodin ɗaya a cikin PG ba zai iya ɗaukar fiye da shafi ɗaya na bayanai ba, amma girman "ma'ana" abu ne na daban. Don rubuta ƙimar juzu'i (varchar/rubutu/bytea) zuwa filin, yi amfani Fasahar TOAST:

PostgreSQL yana amfani da ƙayyadadden girman shafi (yawanci 8 KB), kuma baya barin tuples su wuce shafuka da yawa. Saboda haka, ba shi yiwuwa a adana manyan ƙimar filin kai tsaye. Don shawo kan wannan ƙayyadaddun, manyan ƙimar filin suna matsawa da/ko rarrabuwa cikin layukan jiki da yawa. Wannan yana faruwa ba tare da lura da mai amfani ba kuma yana da ɗan tasiri akan yawancin lambar uwar garken. Ana kiran wannan hanyar da TOAST ...

A zahiri, ga kowane tebur mai filayen "mai yiwuwa babba", ta atomatik an ƙirƙiri tebur mai haɗe-haɗe tare da "yanka". kowane rikodin "manyan" a cikin sassan 2KB:

TOAST(
  chunk_id
    integer
, chunk_seq
    integer
, chunk_data
    bytea
, PRIMARY KEY(chunk_id, chunk_seq)
);

Wato, idan dole ne mu rubuta kirtani tare da ƙimar "babban". data, to ainihin rikodin zai faru ba kawai ga babban tebur da PK ba, har ma zuwa TOAST da PK.

Rage tasirin TOAST

Amma yawancin bayananmu har yanzu ba su kai haka ba, ya kamata ya dace da 8KB - Ta yaya zan iya ajiye kudi akan wannan?..

Wannan shi ne inda sifa ta zo ga taimakonmu STORAGE a ginshiƙin tebur:

  • MUTU damar duka biyu matsawa da raba ajiya. Wannan daidaitaccen zaɓi don yawancin nau'ikan bayanan yarda da TOAST. Da farko yana ƙoƙarin yin matsawa, sannan adana shi a waje da tebur idan har yanzu layin yana da girma sosai.
  • MAI yana ba da damar matsawa amma ba ajiya daban ba. (A zahiri, har yanzu za a yi keɓance ma'ajiyar don irin waɗannan ginshiƙan, amma kawai a matsayin makoma ta ƙarshe, lokacin da babu wata hanyar da za a rage kirtani ta yadda ya dace a shafi.)

A zahiri, wannan shine ainihin abin da muke buƙata don rubutu - matsa shi gwargwadon iyawa, kuma idan bai dace ba kwata-kwata, sanya shi a cikin TOAST. Ana iya yin wannan kai tsaye akan tashi, tare da umarni ɗaya:

ALTER TABLE rawdata_orig ALTER COLUMN data SET STORAGE MAIN;

Yadda za a kimanta tasirin

Tunda kwararar bayanai ke canzawa kowace rana, ba za mu iya kwatanta cikakken lambobi ba, amma a cikin sharuddan dangi karami rabo Mun rubuta shi a cikin TOAST - don haka mafi kyau. Amma akwai haɗari a nan - mafi girman girman "jiki" na kowane rikodin kowane mutum, "fadi" index ya zama, saboda dole ne mu rufe ƙarin shafuka na bayanai.

Sashe kafin canje-canje:

heap  = 37GB (39%)
TOAST = 54GB (57%)
PK    =  4GB ( 4%)

Sashe bayan canje-canje:

heap  = 37GB (67%)
TOAST = 16GB (29%)
PK    =  2GB ( 4%)

A gaskiya, mu ya fara rubutawa zuwa TOAST sau 2 ƙasa da sau da yawa, wanda ya sauke ba kawai faifai ba, har ma da CPU:

Ajiye dinari akan manyan kundin a PostgreSQL
Ajiye dinari akan manyan kundin a PostgreSQL
Zan lura cewa mu ma mun zama ƙanƙanta a cikin “karanta” faifan, ba kawai “rubutu” ba - tunda lokacin shigar da rikodin a cikin tebur, dole ne mu “karanta” ɓangaren bishiyar kowace maƙasudi don tantance ta. matsayi na gaba a cikin su.

Wanene zai iya rayuwa da kyau akan PostgreSQL 11

Bayan an sabunta zuwa PG11, mun yanke shawarar ci gaba da "tuning" TOAST kuma mun lura cewa farawa daga wannan sigar sigar toast_tuple_target:

Lambar sarrafa TOAST tana ƙonewa ne kawai lokacin da ƙimar layin da za a adana a cikin tebur ta fi TOAST_TUPLE_THRESHOLD bytes (yawanci 2 KB). Lambar TOAST za ta matsa da/ko fitar da ƙimar filin daga tebur har sai ƙimar jere ta zama ƙasa da TOAST_TUPLE_TARGET bytes (ƙimar mabambanta, kuma yawanci 2 KB) ko girman ba za a iya ragewa ba.

Mun yanke shawarar cewa yawancin bayanan da muke da su ko dai “ gajeru ne” ko kuma “doguwa sosai”, don haka mun yanke shawarar iyakance kanmu zuwa mafi ƙarancin ƙima.

ALTER TABLE rawplan_orig SET (toast_tuple_target = 128);

Bari mu ga yadda sabbin saitunan suka shafi ɗimbin faifai bayan an daidaita su:

Ajiye dinari akan manyan kundin a PostgreSQL
Ba sharri! Matsakaicin jerin gwano zuwa faifai ya ragu kusan sau 1.5, kuma faifan "aiki" shine kashi 20! Amma watakila wannan ko ta yaya ya shafi CPU?

Ajiye dinari akan manyan kundin a PostgreSQL
Akalla abin bai yi muni ba. Kodayake, yana da wahala a yanke hukunci idan ko da irin waɗannan kundin har yanzu ba za su iya haɓaka matsakaicin nauyin CPU mafi girma ba 5%.

Ta hanyar canza wuraren sharuɗɗan, jimlar ... canje-canje!

Kamar yadda ka sani, dinari yana adana ruble, kuma tare da kundin ajiyar mu yana kusa 10TB / wata ko da ɗan ingantawa zai iya ba da riba mai kyau. Saboda haka, mun kula da tsarin jiki na bayanan mu - yadda daidai filayen “stacked” a cikin rikodin kowane tebur.

Domin saboda daidaita bayanai wannan madaidaici ne yana rinjayar ƙarar da aka samu:

Yawancin gine-gine suna ba da daidaiton bayanai akan iyakokin kalmomin na'ura. Misali, akan tsarin x32 mai 86-bit, integers (nau'in lamba, 4 bytes) za a daidaita su akan iyakar kalma 4-byte, kamar yadda za a ninka madaidaicin lambobi masu iyo (madaidaicin madaidaicin maɗaukaki biyu, 8 bytes). Kuma akan tsarin 64-bit, ƙima biyu za a daidaita su zuwa iyakokin kalmomin 8-byte. Wannan wani dalili ne na rashin jituwa.

Saboda daidaitawa, girman layin tebur ya dogara da tsarin filayen. Yawancin lokaci wannan tasirin ba shi da kyau sosai, amma a wasu lokuta yana iya haifar da karuwa mai yawa a cikin girman. Misali, idan kun haxa char(1) da filayen lamba, yawanci za a yi asarar bytes 3 tsakanin su.

Bari mu fara da samfuran roba:

SELECT pg_column_size(ROW(
  '0000-0000-0000-0000-0000-0000-0000-0000'::uuid
, 0::smallint
, '2019-01-01'::date
));
-- 48 байт

SELECT pg_column_size(ROW(
  '2019-01-01'::date
, '0000-0000-0000-0000-0000-0000-0000-0000'::uuid
, 0::smallint
));
-- 46 байт

A ina aka sami ƙarin bytes biyu a shari'ar farko? Yana da sauki - 2-byte smallint daidaitacce akan iyaka 4-byte kafin filin na gaba, kuma lokacin da ya kasance na ƙarshe, babu wani abu kuma babu buƙatar daidaitawa.

A ka'idar, komai yana da kyau kuma zaku iya sake tsara filayen yadda kuke so. Bari mu bincika ainihin bayanan ta amfani da misalin ɗaya daga cikin tebur, sashin yau da kullun wanda ya ƙunshi 10-15GB.

Tsarin farko:

CREATE TABLE public.plan_20190220
(
-- Унаследована from table plan:  pack uuid NOT NULL,
-- Унаследована from table plan:  recno smallint NOT NULL,
-- Унаследована from table plan:  host uuid,
-- Унаследована from table plan:  ts timestamp with time zone,
-- Унаследована from table plan:  exectime numeric(32,3),
-- Унаследована from table plan:  duration numeric(32,3),
-- Унаследована from table plan:  bufint bigint,
-- Унаследована from table plan:  bufmem bigint,
-- Унаследована from table plan:  bufdsk bigint,
-- Унаследована from table plan:  apn uuid,
-- Унаследована from table plan:  ptr uuid,
-- Унаследована from table plan:  dt date,
  CONSTRAINT plan_20190220_pkey PRIMARY KEY (pack, recno),
  CONSTRAINT chck_ptr CHECK (ptr IS NOT NULL),
  CONSTRAINT plan_20190220_dt_check CHECK (dt = '2019-02-20'::date)
)
INHERITS (public.plan)

Sashe bayan canza oda shafi - daidai filayen guda, tsari daban-daban:

CREATE TABLE public.plan_20190221
(
-- Унаследована from table plan:  dt date NOT NULL,
-- Унаследована from table plan:  ts timestamp with time zone,
-- Унаследована from table plan:  pack uuid NOT NULL,
-- Унаследована from table plan:  recno smallint NOT NULL,
-- Унаследована from table plan:  host uuid,
-- Унаследована from table plan:  apn uuid,
-- Унаследована from table plan:  ptr uuid,
-- Унаследована from table plan:  bufint bigint,
-- Унаследована from table plan:  bufmem bigint,
-- Унаследована from table plan:  bufdsk bigint,
-- Унаследована from table plan:  exectime numeric(32,3),
-- Унаследована from table plan:  duration numeric(32,3),
  CONSTRAINT plan_20190221_pkey PRIMARY KEY (pack, recno),
  CONSTRAINT chck_ptr CHECK (ptr IS NOT NULL),
  CONSTRAINT plan_20190221_dt_check CHECK (dt = '2019-02-21'::date)
)
INHERITS (public.plan)

An ƙaddara jimlar jimlar sashe ta adadin "gaskiya" kuma ya dogara ne kawai akan matakai na waje, don haka bari mu raba girman girman (pg_relation_size) ta adadin bayanan da ke ciki - wato, muna samun matsakaicin girman ainihin rikodin rikodin:

Ajiye dinari akan manyan kundin a PostgreSQL
Rage ƙarar 6%., Babban!

Amma duk abin da, ba shakka, ba haka ba ne rosy - bayan duk, a cikin fihirisa ba za mu iya canza tsarin filayen ba, sabili da haka "gaba ɗaya" (pg_total_relation_size) ...

Ajiye dinari akan manyan kundin a PostgreSQL
...har yanzu a nan ma an samu kashi 1.5%ba tare da canza layin lamba ɗaya ba. I, iya!

Ajiye dinari akan manyan kundin a PostgreSQL

Na lura cewa zaɓin da ke sama don tsara filayen ba shine gaskiyar cewa shine mafi kyawu ba. Domin ba kwa son “yaga” wasu shingen filayen don kyawawan dalilai - alal misali, ma’aurata (pack, recno), wanda shine PK na wannan tebur.

Gabaɗaya, ƙayyadaddun tsarin “mafi ƙanƙanta” na filayen aiki ne mai sauƙin “ƙarfi” mai sauƙi. Don haka, zaku iya samun sakamako mafi kyau daga bayananku fiye da namu - gwada shi!

source: www.habr.com

Add a comment