Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL

Iqhubeka nesihloko sokurekhoda ukusakazwa kwedatha okukhulu okuphakanyiswe ngu isihloko esandulele mayelana nokuhlukaniswa, kulokhu sizobheka izindlela ongakwazi ngazo nciphisa usayizi "ongokomzimba" wokugciniwe ku-PostgreSQL, kanye nomthelela wabo ekusebenzeni kweseva.

Sizokhuluma ngakho Izilungiselelo ze-TOAST nokuqondanisa kwedatha. "Ngokwesilinganiso," lezi zindlela ngeke zilondoloze izinsiza eziningi kakhulu, kodwa ngaphandle kokushintsha ikhodi yohlelo nhlobo.

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
Kodwa-ke, isipiliyoni sethu siphumelele kakhulu kulokhu, njengoba ukugcinwa cishe kwanoma yikuphi ukuqapha ngokwemvelo kakhulukazi ukwengeza kuphela ngokwedatha erekhodiwe. Futhi uma uzibuza ukuthi ungayifundisa kanjani i-database ukubhala kudiski esikhundleni salokho 200MB / s ingxenye enkulu - sicela ngaphansi kwekati.

Izimfihlo ezincane zedatha enkulu

Ngephrofayili yomsebenzi inkonzo yethu,bahlale bendiza kuye besuka ezindlini amaphakheji wombhalo.

Futhi kusukela I-VLSI complexesisiqaphayo isizindalwazi sayo siwumkhiqizo wezingxenye eziningi onezakhiwo zedatha eziyinkimbinkimbi, bese kuba nemibuzo ukusebenza okuphezulu kuvele kunje "ivolumu eningi" enengqondo eyinkimbinkimbi ye-algorithmic. Ngakho-ke umthamo wesenzakalo ngasinye sesicelo noma uhlelo lokwenziwa oluwumphumela kulogi oluza kithi luba “ngokwesilinganiso” esikhulu kakhulu.

Ake sibheke ukwakheka kwelinye lamathebula esibhala kuwo idatha “eluhlaza” - okungukuthi, nawu umbhalo wangempela ophuma kulogi:

CREATE TABLE rawdata_orig(
  pack -- PK
    uuid NOT NULL
, recno -- PK
    smallint NOT NULL
, dt -- ключ секции
    date
, data -- самое главное
    text
, PRIMARY KEY(pack, recno)
);

Isibonakaliso esijwayelekile (sesivele sihlukaniswe, kunjalo, ngakho-ke lesi yisifanekiso sesigaba), lapho into ebaluleke kakhulu umbhalo. Ngezinye izikhathi voluminous kakhulu.

Khumbula ukuthi usayizi "womzimba" werekhodi elilodwa ku-PG awukwazi ukuthatha ikhasi elingaphezu kwelilodwa ledatha, kodwa usayizi "onengqondo" uyindaba ehluke ngokuphelele. Ukuze ubhale inani levolumu (varchar/text/bytea) endaweni, sebenzisa Ubuchwepheshe be-TOAST:

I-PostgreSQL isebenzisa usayizi wekhasi ongashintshi (imvamisa u-8 KB), futhi ayivumeli ama-tuples ukuthi avule amakhasi amaningi. Ngakho-ke, akunakwenzeka ukugcina ngokuqondile amanani enkundla amakhulu kakhulu. Ukuze unqobe lo mkhawulo, amanani enkambu enkulu ayacindezelwa futhi/noma ahlukaniswe emigqeni eminingi ebonakalayo. Lokhu kwenzeka umsebenzisi enganakiwe futhi kunomthelela omncane kumakhodi amaningi weseva. Le ndlela yaziwa ngokuthi TOAST...

Eqinisweni, kuwo wonke amathebula anezinkambu "okungenzeka zibe nkulu", ngokuzenzakalelayo itafula elibhanqiwe eline "slicing" liyadalwa irekhodi ngalinye “elikhulu” kumasegimenti angu-2KB:

TOAST(
  chunk_id
    integer
, chunk_seq
    integer
, chunk_data
    bytea
, PRIMARY KEY(chunk_id, chunk_seq)
);

Okusho ukuthi, uma kufanele sibhale iyunithi yezinhlamvu enenani "elikhulu". data, khona-ke ukurekhoda kwangempela kuzokwenzeka hhayi kuphela etafuleni elikhulu kanye ne-PK yalo, kodwa futhi ku-TOAST kanye ne-PK yayo.

Ukunciphisa ithonya le-TOAST

Kodwa amarekhodi ethu amaningi awakabi makhulu kangako, kufanele ilingane ku-8KB - Ngingayonga kanjani imali kulokhu? ..

Yilapho imfanelo isisiza khona STORAGE kukholamu yetafula:

  • KWANDISIWE ivumela kokubili ukucindezela kanye nesitoreji esihlukile. Lokhu inketho ejwayelekile ngezinhlobo eziningi zedatha ezithobela i-TOAST. Iqala izame ukwenza ukuminyanisa, bese iyigcina ngaphandle kwetafula uma umugqa usemkhulu kakhulu.
  • ISANDLA ivumela ukucindezela kodwa hhayi isitoreji esihlukene. (Eqinisweni, ukugcinwa okuhlukile kusazokwenziwa kumakholomu anjalo, kodwa kuphela njengendlela yokugcina, lapho ingekho enye indlela yokunciphisa intambo ukuze ingene ekhasini.)

Eqinisweni, yilokhu kanye esikudingayo ngombhalo - yicindezele ngangokunokwenzeka, futhi uma ingalingani nhlobo, yifake ku-TOAST. Lokhu kungenziwa ngokuqondile ku-fly, ngomyalo owodwa:

ALTER TABLE rawdata_orig ALTER COLUMN data SET STORAGE MAIN;

Indlela yokuhlola umphumela

Njengoba ukugeleza kwedatha kushintsha nsuku zonke, asikwazi ukuqhathanisa izinombolo eziphelele, kodwa ngokwemibandela ehlobene isabelo esincane Sikubhale phansi nge-TOAST - kungcono kakhulu. Kodwa kukhona ingozi lapha - uma ivolumu "engokomzimba" yerekhodi ngalinye ikhudlwana, inkomba iba "banzi", ngoba kufanele simboze amakhasi amaningi edatha.

Ingxenye ngaphambi kwezinguquko:

heap  = 37GB (39%)
TOAST = 54GB (57%)
PK    =  4GB ( 4%)

Ingxenye ngemva kwezinguquko:

heap  = 37GB (67%)
TOAST = 16GB (29%)
PK    =  2GB ( 4%)

Eqinisweni, thina uqale ukubhalela ku-TOAST izikhathi ezi-2 kancane, engalayishi kuphela idiski, kodwa ne-CPU:

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
Ngizoqaphela ukuthi sesiphinde saba bancane “ekufundeni” idiski, hhayi “ukubhala” kuphela - njengoba lapho sifaka irekhodi etafuleni, kufanele futhi “sifunde” ingxenye yesihlahla senkomba ngayinye ukuze sinqume ukuthi isikhundla esizayo kuzo.

Ubani ongaphila kahle ku-PostgreSQL 11

Ngemva kokuthuthukela ku-PG11, sinqume ukuqhubeka "nokushuna" i-TOAST futhi saqaphela ukuthi kusukela kule nguqulo ipharamitha yatholakala ukuze ishunwe. toast_tuple_target:

Ikhodi yokucubungula ye-TOAST ivutha kuphela uma inani lomugqa okufanele ligcinwe kuthebula likhulu kunamabhayithi angu-TOAST_TUPLE_THRESHOLD (ngokuvamile angu-2 KB). Ikhodi ye-TOAST izominyanisa futhi/noma isuse amanani enkambu ngaphandle kwethebula kuze kube yilapho inani lomugqa liba ngaphansi kwamabhayithi angu-TOAST_TUPLE_TARGET (inani eliguquguqukayo, ngokuvamile elingu-2 KB) noma usayizi awukwazi ukwehliswa.

Sinqume ukuthi idatha esivame ukuba nayo “imfishane kakhulu” noma “inde kakhulu”, ngakho-ke sinqume ukuzibekela umkhawulo enanini elingakhona:

ALTER TABLE rawplan_orig SET (toast_tuple_target = 128);

Ake sibone ukuthi izilungiselelo ezintsha zikuthinte kanjani ukulayishwa kwediski ngemuva kokulungiswa kabusha:

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
Akukubi! Isilinganiso ulayini kudiski wehlile cishe izikhathi ezingu-1.5, futhi i-disk "ematasa" ingamaphesenti angu-20! Kodwa mhlawumbe lokhu ngandlela thize kuthinte i-CPU?

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
Okungenani akuzange kube kubi nakakhulu. Noma kunjalo, kunzima ukwahlulela ukuthi ngisho namavolumu anjalo awakwazi ukukhuphula isilinganiso somthwalo we-CPU phezulu 5%.

Ngokushintsha izindawo zemigomo, isamba... siyashintsha!

Njengoba wazi, i-penny igcina i-ruble, futhi ngamavolumu ethu okugcina imayelana 10TB/ngenyanga ngisho nokwenza kahle okuncane kunganikeza inzuzo enhle. Ngakho-ke, sinake ukwakheka kwedatha yethu - kanjani kahle Izinkambu "ezistakiwe" ngaphakathi kwerekhodi ngalinye lamatafula.

Ngoba ngenxa ukuqondanisa kwedatha lokhu kuqonde phambili ithinta ivolumu ewumphumela:

Izakhiwo eziningi zinikeza ukuqondana kwedatha emingceleni yamagama omshini. Isibonelo, ohlelweni lwe-32-bit x86, izinombolo (uhlobo oluphelele, amabhayithi angu-4) zizoqondaniswa emngceleni wegama we-4-byte, njengoba kuzophinde kuphindwe kabili izinombolo zamaphoyinti antantayo anembayo (iphoyinti elintantayo elinembayo eliphindwe kabili, amabhayithi ayi-8). Futhi ohlelweni lwe-64-bit, amanani aphindwe kabili azoqondaniswa nemingcele yamagama angu-8-byte. Lesi esinye isizathu sokungahambisani.

Ngenxa yokuqondanisa, ubukhulu bomugqa wethebula buncike ekuhlelekeni kwezinkambu. Ngokuvamile lo mphumela awubonakali kakhulu, kodwa kwezinye izimo ungaholela ekwandeni okukhulu kosayizi. Isibonelo, uma uxuba i-char(1) nezinkambu ze-integer, ngokuvamile kuzoba namabhayithi angu-3 okumoshwayo phakathi kwazo.

Ake siqale ngamamodeli okwenziwa:

SELECT pg_column_size(ROW(
  '0000-0000-0000-0000-0000-0000-0000-0000'::uuid
, 0::smallint
, '2019-01-01'::date
));
-- 48 байт

SELECT pg_column_size(ROW(
  '2019-01-01'::date
, '0000-0000-0000-0000-0000-0000-0000-0000'::uuid
, 0::smallint
));
-- 46 байт

Aqhamuke kuphi amabhayithi ambalwa esimeni sokuqala? Kulula - I-2-byte encane iqondaniswe emngceleni we-4-byte ngaphambi kwenkambu elandelayo, futhi uma kungeyokugcina, akukho lutho futhi asikho isidingo sokuqondanisa.

Ngokombono, konke kuhamba kahle futhi ungakwazi ukuhlela kabusha amasimu ngendlela othanda ngayo. Ake siyihlole kudatha yangempela sisebenzisa isibonelo selinye lamatafula, ingxenye yansuku zonke ethatha i-10-15GB.

Isakhiwo sokuqala:

CREATE TABLE public.plan_20190220
(
-- Унаследована from table plan:  pack uuid NOT NULL,
-- Унаследована from table plan:  recno smallint NOT NULL,
-- Унаследована from table plan:  host uuid,
-- Унаследована from table plan:  ts timestamp with time zone,
-- Унаследована from table plan:  exectime numeric(32,3),
-- Унаследована from table plan:  duration numeric(32,3),
-- Унаследована from table plan:  bufint bigint,
-- Унаследована from table plan:  bufmem bigint,
-- Унаследована from table plan:  bufdsk bigint,
-- Унаследована from table plan:  apn uuid,
-- Унаследована from table plan:  ptr uuid,
-- Унаследована from table plan:  dt date,
  CONSTRAINT plan_20190220_pkey PRIMARY KEY (pack, recno),
  CONSTRAINT chck_ptr CHECK (ptr IS NOT NULL),
  CONSTRAINT plan_20190220_dt_check CHECK (dt = '2019-02-20'::date)
)
INHERITS (public.plan)

Isigaba ngemva kokushintsha ukuhleleka kwekholomu - ncamashi izinkambu ezifanayo, ukuhleleka okuhlukile:

CREATE TABLE public.plan_20190221
(
-- Унаследована from table plan:  dt date NOT NULL,
-- Унаследована from table plan:  ts timestamp with time zone,
-- Унаследована from table plan:  pack uuid NOT NULL,
-- Унаследована from table plan:  recno smallint NOT NULL,
-- Унаследована from table plan:  host uuid,
-- Унаследована from table plan:  apn uuid,
-- Унаследована from table plan:  ptr uuid,
-- Унаследована from table plan:  bufint bigint,
-- Унаследована from table plan:  bufmem bigint,
-- Унаследована from table plan:  bufdsk bigint,
-- Унаследована from table plan:  exectime numeric(32,3),
-- Унаследована from table plan:  duration numeric(32,3),
  CONSTRAINT plan_20190221_pkey PRIMARY KEY (pack, recno),
  CONSTRAINT chck_ptr CHECK (ptr IS NOT NULL),
  CONSTRAINT plan_20190221_dt_check CHECK (dt = '2019-02-21'::date)
)
INHERITS (public.plan)

Umthamo ophelele wesigaba unqunywa inombolo "yamaqiniso" futhi incike kuphela ezinqubweni zangaphandle, ngakho-ke ake sihlukanise usayizi wenqwaba (pg_relation_size) ngenani lamarekhodi akuyo - okungukuthi, sithola usayizi omaphakathi werekhodi langempela eligciniwe:

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
Kukhishwe ivolumu engu-6%., Kuhle!

Kodwa yonke into, yebo, ayinhle kangako - phela, ezinkombeni asikwazi ukushintsha ukuhleleka kwezinkambu, futhi ngalokho “ngokujwayelekile” (pg_total_relation_size) ...

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL
...nami ngisekhona yongiwe 1.5%ngaphandle kokushintsha umugqa owodwa wekhodi. Yebo Yebo!

Londoloza ipeni ngamavolumu amakhulu ku-PostgreSQL

Ngiyaqaphela ukuthi inketho engenhla yokuhlela izinkambu akulona iqiniso lokuthi iyona engcono kakhulu. Ngoba awufuni "ukuklebhula" ezinye izinkambu ngenxa yezizathu zobuhle - isibonelo, umbhangqwana (pack, recno), okuyi-PK yaleli thebula.

Ngokuvamile, ukunquma ukuhlelwa “kobuncane” bezinkambu kuwumsebenzi olula “we-brute force”. Ngakho-ke, ungathola imiphumela engcono kakhulu kudatha yakho kuneyethu - izame!

Source: www.habr.com

Engeza amazwana