Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL

Ukuqhubela phambili isihloko sokurekhoda imilambo yedatha enkulu ephakanyiswe yi inqaku elidlulileyo malunga nokwahlulahlula, kule siza kujonga iindlela onokuthi ngazo ukunciphisa ubukhulu "bomzimba" obugciniweyo kwi-PostgreSQL, kunye nefuthe labo ekusebenzeni kweseva.

Siza kuthetha ngayo Izicwangciso ze-TOAST kunye nokulungelelaniswa kwedatha. "Ngomndilili," ezi ndlela aziyi kugcina izixhobo ezininzi, kodwa ngaphandle kokuguqula ikhowudi yesicelo nonke.

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
Nangona kunjalo, amava ethu ajika abe nemveliso kakhulu kulo mba, ekubeni ukugcinwa kwaso nakuphi na ukubeka iliso ngobume bayo ubukhulu becala isongezo-kuphela ngokwedatha erekhodiweyo. Kwaye ukuba uyazibuza ukuba ungayifundisa njani i-database ukubhala kwidiski endaweni yoko 200MB / s isiqingatha kakhulu - nceda phantsi kwekati.

Iimfihlo ezincinci zedatha enkulu

Ngeprofayile yomsebenzi inkonzo yethu, zibhabha ziye kuye rhoqo zisuka kumalari iipakethe zokubhaliweyo.

Kwaye ukusukela VLSI complexesisibeka iliso kuluhlu lwakhe lwemveliso enezinto ezininzi ezinolwakhiwo lwedatha oluntsonkothileyo, emva koko imibuzo ukusebenza okuphezulu kwenzeka njalo "I-multi-volume" enengqiqo ye-algorithmic enzima. Ke umthamo womcimbi ngamnye wesicelo okanye isiphumo sesicwangciso sophumezo kwilog esiza kuthi sijike sibe "kumndilili" omkhulu kakhulu.

Makhe sijonge isakhiwo senye yeetafile esibhala kuzo idatha "ekrwada" - oko kukuthi, nantsi isicatshulwa esiyintsusa ukusuka kwingeniso yelogi:

CREATE TABLE rawdata_orig(
  pack -- PK
    uuid NOT NULL
, recno -- PK
    smallint NOT NULL
, dt -- ключ секции
    date
, data -- самое главное
    text
, PRIMARY KEY(pack, recno)
);

Uphawu oluqhelekileyo (sele luhlulwe, ngokuqinisekileyo, ngoko le template yecandelo), apho into ebaluleke kakhulu isicatshulwa. Ngamanye amaxesha kunzima kakhulu.

Khumbula ukuba ubungakanani "bomzimba" berekhodi enye kwiPG ayinakuhlala ngaphezulu kwephepha elinye ledatha, kodwa ubungakanani "obusengqiqweni" ngumcimbi owahluke ngokupheleleyo. Ukubhala ixabiso le-volumetric (varchar/text/bytea) kwintsimi, sebenzisa Itekhnoloji ye-TOAST:

I-PostgreSQL isebenzisa ubungakanani bephepha elimisiweyo (ngokuqhelekileyo i-8 KB), kwaye ayivumeli ii-tuples ukuba zivule amaphepha amaninzi. Ngoko ke, akunakwenzeka ukugcina ngokuthe ngqo amaxabiso entsimi amakhulu kakhulu. Ukoyisa lo mda, amaxabiso amakhulu entsimi ayaxinzelelwa kwaye/okanye ahlulwe kwimigca emininzi yomzimba. Oku kwenzeka ngokungaqatshelwanga ngumsebenzisi kwaye kunempembelelo encinci kwikhowudi yomncedisi. Le ndlela yaziwa ngokuba yi-TOAST...

Ngapha koko, kwitafile nganye enemihlaba "enokuba nkulu", ngokuzenzekelayo itafile edibeneyo kunye ne "slicing" yenziwe irekhodi nganye "enkulu" kumacandelo e-2KB:

TOAST(
  chunk_id
    integer
, chunk_seq
    integer
, chunk_data
    bytea
, PRIMARY KEY(chunk_id, chunk_seq)
);

Oko kukuthi, ukuba kufuneka sibhale umtya ngexabiso "elikhulu". data, emva koko ushicilelo lokwenyani luyakwenzeka kungekuphela nje kwitafile ephambili kunye ne-PK yayo, kodwa nakwi-TOAST kunye ne-PK yayo.

Ukunciphisa impembelelo ye-TOAST

Kodwa uninzi lweerekhodi zethu azikabi nkulu kangako, kufuneka ingene ku-8KB Ndingayigcina njani imali kule nto? ..

Apha kulapho uphawu lusiza khona kuncedo lwethu STORAGE kwikholamu yetafile:

  • IXELisiwe ivumela zombini ucinezelo kunye nokugcinwa okwahlukileyo. Oku ukhetho olusemgangathweni kwiintlobo ezininzi zedatha ye-TOAST ehambelanayo. Izama kuqala ukwenza ucinezelo, emva koko igcine ngaphandle kwetafile ukuba umqolo usemkhulu kakhulu.
  • IMBALI ivumela ucinezelo kodwa hayi ugcino olwahlukileyo. (Enyanisweni, ukugcinwa okwahlukileyo kusaza kwenziwa kwikholamu ezinjalo, kodwa kuphela njengento yokugqibela, xa ingekho enye indlela yokucutha umtya ukuze ulingane kwiphepha.)

Enyanisweni, yile nto kanye esiyidingayo kwisicatshulwa - yicinezele kangangoko, kwaye ukuba ayingeni kwaphela, yibeke kwi-TOAST. Oku kunokwenziwa ngokuthe ngqo kubhabho, ngomyalelo omnye:

ALTER TABLE rawdata_orig ALTER COLUMN data SET STORAGE MAIN;

Indlela yokuvavanya umphumo

Ekubeni ukuhamba kwedatha kutshintsha yonke imihla, asikwazi ukuthelekisa amanani apheleleyo, kodwa ngokwemiqathango ehambelanayo isabelo esincinci Siyibhale phantsi kwi-TOAST - ngcono kakhulu. Kodwa kukho ingozi apha-inkulu ivolumu "yomzimba" yerekhodi nganye, "ububanzi" isalathisi siba, kuba kufuneka sigubungele amaphepha amaninzi edatha.

Icandelo phambi kotshintsho:

heap  = 37GB (39%)
TOAST = 54GB (57%)
PK    =  4GB ( 4%)

Icandelo emva kotshintsho:

heap  = 37GB (67%)
TOAST = 16GB (29%)
PK    =  2GB ( 4%)

Enyanisweni, thina uqale ukubhalela kwi-TOAST ka-2 amaxesha ambalwa rhoqo, eyothula kungekuphela nje idiski, kodwa neCPU:

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
Ndiza kuqaphela ukuba siye sabancinci "ekufundeni" idiski, hayi "ukubhala" kuphela - kuba xa ufaka irekhodi kwitafile, kufuneka "sifunde" inxalenye yomthi wesalathiso ngasinye ukuze siqonde isikhundla kwixesha elizayo kuzo.

Ngubani onokuphila kakuhle kwiPostgreSQL 11

Emva kokuhlaziya kwi-PG11, sigqibe kwelokuba siqhubeke "silungisa" i-TOAST kwaye saqaphela ukuba ukuqala kolu guqulelo iparameter iye yafumaneka ukuze ilungiswe. toast_tuple_target:

Ikhowudi yokusetyenzwa ye-TOAST ivutha kuphela xa ixabiso lomqolo eliza kugcinwa kwitheyibhile likhulu kune TOAST_TUPLE_THRESHOLD bytes (idla ngokuba yi-2 KB). Ikhowudi ye-TOAST iyakucinezela kunye/okanye isuse amaxabiso entsimi ngaphandle kwetafile de ixabiso lomqolo libe ngaphantsi kwe-TOAST_TUPLE_TARGET bytes (ixabiso eliguquguqukayo, eliqhele ukuba yi-2 KB) okanye ubungakanani abunakuncitshiswa.

Sigqibe kwelokuba idatha esiqhele ukuba nayo “imfutshane kakhulu” okanye “inde kakhulu”, ngoko ke sigqibe kwelokuba sizilinganisele kwelona xabiso lincinci linokwenzeka:

ALTER TABLE rawplan_orig SET (toast_tuple_target = 128);

Makhe sibone ukuba useto olutsha luchaphazele njani ukulayishwa kwediski emva kohlengahlengiso:

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
Akukubanga! I-avareji umgca kwidiski wehlile malunga namaxesha angama-1.5, kwaye idiski "ixakekile" ngama-20 ekhulwini! Kodwa mhlawumbi oku kuchaphazele i-CPU ngandlela thile?

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
Noko ayizange ibe worse. Nangona kunjalo, kunzima ukugweba ukuba imiqulu enjalo ayinako ukuphakamisa umndilili womthwalo we-CPU phezulu 5%.

Ngokutshintsha iindawo zemigaqo, isamba... utshintsho!

Njengoko uyazi, ipenki igcina i-ruble, kwaye kunye nemiqulu yethu yokugcina malunga nayo 10TB/ngenyanga nokuba ukulungelelaniswa okuncinci kunokunika inzuzo elungileyo. Ke ngoko, sinikele ingqalelo kubume bomzimba bedatha yethu - njani kanye iindawo "ezipakishwe" ngaphakathi kwirekhodi itafile nganye.

Ngenxa yokuba ulungelelwaniso lwedatha oku kuthe ngqo phambili ichaphazela umthamo wesiphumo:

Uninzi lwezakhiwo zibonelela ngolungelelwaniso lwedatha kwimida yamagama omatshini. Umzekelo, kwi-32-bit x86 inkqubo, ii-integers (integer type, 4 bytes) ziya kulungelelaniswa kumda wamagama we-4-byte, njengoko kuya kuphinda kuphindwe kabini amanani eendawo ezidadayo ezichanekileyo (indawo edadayo yokuchaneka kabini, iibhayithi ezisi-8). Kwaye kwinkqubo ye-64-bit, amaxabiso aphindwe kabini aya kulungelelaniswa ne-8-byte yemida yamagama. Esi sesinye isizathu sokungahambelani.

Ngenxa yokulungelelaniswa, ubungakanani bomqolo wetafile buxhomekeke kulandelelwano lwamasimi. Ngokuqhelekileyo esi siphumo asibonakali kakhulu, kodwa kwezinye iimeko kunokukhokelela ekunyuseni okukhulu kobukhulu. Umzekelo, ukuba udibanisa i-char(1) kunye neendawo ezipheleleyo, kuyakubakho i-bytes ezi-3 ezichithwayo phakathi kwazo.

Masiqale ngeemodeli zokwenziwa:

SELECT pg_column_size(ROW(
  '0000-0000-0000-0000-0000-0000-0000-0000'::uuid
, 0::smallint
, '2019-01-01'::date
));
-- 48 байт

SELECT pg_column_size(ROW(
  '2019-01-01'::date
, '0000-0000-0000-0000-0000-0000-0000-0000'::uuid
, 0::smallint
));
-- 46 байт

Zivela phi iibytes ezimbalwa ezongezelelweyo kwimeko yokuqala? Ilula - I-2-byte encinci ilungelelaniswe kumda we-4-byte phambi kwentsimi elandelayo, kwaye xa ingowokugqibela, akukho nto kwaye akukho mfuneko yokulungelelanisa.

Kwithiyori, yonke into ilungile kwaye unokuhlengahlengisa amasimi njengoko uthanda. Makhe sihlolisise kwidatha yangempela usebenzisa umzekelo wenye yeetafile, icandelo lemihla ngemihla elihlala kwi-10-15GB.

Ulwakhiwo lokuqala:

CREATE TABLE public.plan_20190220
(
-- Унаследована from table plan:  pack uuid NOT NULL,
-- Унаследована from table plan:  recno smallint NOT NULL,
-- Унаследована from table plan:  host uuid,
-- Унаследована from table plan:  ts timestamp with time zone,
-- Унаследована from table plan:  exectime numeric(32,3),
-- Унаследована from table plan:  duration numeric(32,3),
-- Унаследована from table plan:  bufint bigint,
-- Унаследована from table plan:  bufmem bigint,
-- Унаследована from table plan:  bufdsk bigint,
-- Унаследована from table plan:  apn uuid,
-- Унаследована from table plan:  ptr uuid,
-- Унаследована from table plan:  dt date,
  CONSTRAINT plan_20190220_pkey PRIMARY KEY (pack, recno),
  CONSTRAINT chck_ptr CHECK (ptr IS NOT NULL),
  CONSTRAINT plan_20190220_dt_check CHECK (dt = '2019-02-20'::date)
)
INHERITS (public.plan)

Icandelo emva kokutshintsha umyalelo wekholomu - ngokuchanekileyo imihlaba efanayo, ulandelelwano nje olwahlukileyo:

CREATE TABLE public.plan_20190221
(
-- Унаследована from table plan:  dt date NOT NULL,
-- Унаследована from table plan:  ts timestamp with time zone,
-- Унаследована from table plan:  pack uuid NOT NULL,
-- Унаследована from table plan:  recno smallint NOT NULL,
-- Унаследована from table plan:  host uuid,
-- Унаследована from table plan:  apn uuid,
-- Унаследована from table plan:  ptr uuid,
-- Унаследована from table plan:  bufint bigint,
-- Унаследована from table plan:  bufmem bigint,
-- Унаследована from table plan:  bufdsk bigint,
-- Унаследована from table plan:  exectime numeric(32,3),
-- Унаследована from table plan:  duration numeric(32,3),
  CONSTRAINT plan_20190221_pkey PRIMARY KEY (pack, recno),
  CONSTRAINT chck_ptr CHECK (ptr IS NOT NULL),
  CONSTRAINT plan_20190221_dt_check CHECK (dt = '2019-02-21'::date)
)
INHERITS (public.plan)

Umthamo opheleleyo wecandelo uchongwa linani "leenyaniso" kwaye kuxhomekeke kuphela kwiinkqubo zangaphandle, ngoko ke masahlule ubungakanani bemfumba (pg_relation_size) ngenani leerekhodi kuyo - oko kukuthi, sifumana ubungakanani obuqhelekileyo berekhodi yokwenyani egciniweyo:

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
Susa i-6% yevolumu, Kakhulu!

Kodwa yonke into, ewe, ayilunganga kangako - emva kwayo yonke loo nto, kwizalathisi asinako ukutshintsha ulandelelwano lwemihlaba, kwaye ke ngoko "ngokubanzi" (pg_total_relation_size) ...

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL
...ndiselapha nam igcinwe nge-1.5%ngaphandle kokutshintsha umgca omnye wekhowudi. Ewe Ewe!

Gcina ipeni kwimiqulu emikhulu kwiPostgreSQL

Ndiyaqaphela ukuba olu khetho lungasentla lokulungiselela imihlaba asiyonyani yokuba yeyona ilungileyo. Kuba awufuni "ukukrazula" ezinye iibhloko zemimandla ngenxa yezizathu zobuhle - umzekelo, isibini (pack, recno), eyi-PK yale theyibhile.

Ngokubanzi, ukumisela "ubuncinci" ukucwangciswa kwamasimi ngumsebenzi olula "olunyamekileyo". Ke ngoko, ungafumana iziphumo ezingcono ngakumbi kwidatha yakho kuneyethu - yizame!

umthombo: www.habr.com

Yongeza izimvo