Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

Kuv xav qhia rau koj txog kuv thawj qhov kev paub ua tiav ntawm kev rov ua dua Postgres database kom ua haujlwm tag nrho. Kuv tau paub nrog Postgres DBMS ib nrab xyoo dhau los; ua ntej kuv tsis muaj kev paub txog kev tswj hwm database txhua.

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

Kuv ua hauj lwm ua ib tug semi-DevOps engineer nyob rau hauv lub tuam txhab IT loj. Peb lub tuam txhab tsim software rau cov kev pabcuam siab, thiab kuv muaj lub luag haujlwm rau kev ua haujlwm, kev saib xyuas thiab kev xa tawm. Kuv tau txais ib txoj haujlwm txheem: hloov kho daim ntawv thov ntawm ib lub server. Daim ntawv thov raug sau rau hauv Django, thaum lub sijhawm hloov pauv hloov pauv tau ua tiav (hloov hauv cov qauv database), thiab ua ntej cov txheej txheem no peb nqa tag nrho cov ntaub ntawv pov tseg los ntawm cov txheej txheem pg_dump program, tsuas yog hauv rooj plaub.

Ib qho yuam kev uas tsis xav txog tshwm sim thaum noj pob tseg (Postgres version 9.5):

pg_dump: Oumping the contents of table “ws_log_smevlog” failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: invalid page in block 4123007 of relatton base/16490/21396989
pg_dump: The command was: COPY public.ws_log_smevlog [...]
pg_dunp: [parallel archtver] a worker process dled unexpectedly

kev ua yuam kev "invalid page nyob rau hauv block" hais txog cov teeb meem ntawm cov ntaub ntawv system theem, uas yog phem heev. Ntawm ntau lub rooj sab laj nws tau hais kom ua VACUUM tag nrho nrog kev xaiv zero_damaged_pages daws qhov teeb meem no. Zoo, cia peb sim ...

Npaj rau kev rov qab los

XIM! Nco ntsoov nqa Postgres thaub qab ua ntej kev sim los kho koj cov ntaub ntawv. Yog tias koj muaj lub tshuab virtual, nres lub database thiab thaij duab. Yog tias nws tsis tuaj yeem nqa ib qho snapshot, nres lub database thiab luam cov ntsiab lus ntawm Postgres directory (suav nrog wal files) mus rau qhov chaw nyab xeeb. Qhov tseem ceeb hauv peb txoj kev lag luam yog tsis ua kom tej yam tsis zoo. Nyeem nws.

Txij li thaum lub database feem ntau ua hauj lwm rau kuv, kuv txwv kuv tus kheej mus rau ib tug tsis tu ncua database dump, tab sis tsis suav nrog lub rooj nrog puas cov ntaub ntawv (kev xaiv -T, --exclude-table=TABLE hauv pg_dump).

Tus neeg rau zaub mov yog lub cev, nws tsis tuaj yeem nqa snapshot. Cov thaub qab tau raug tshem tawm, cia peb txav mus.

Cov ntaub ntawv system check

Ua ntej sim rov qab kho cov ntaub ntawv, peb yuav tsum xyuas kom meej tias txhua yam nyob rau hauv kev txiav txim nrog cov ntaub ntawv nws tus kheej. Thiab nyob rau hauv cov ntaub ntawv ntawm kev ua yuam kev, kho lawv, vim hais tias txwv tsis pub koj tsuas ua tau tej yam phem.

Hauv kuv qhov xwm txheej, cov ntaub ntawv kaw lus nrog cov ntaub ntawv khaws cia tau teeb tsa hauv "/srv" thiab hom yog ext4.

Nres lub database: systemctl nres [email tiv thaiv] thiab xyuas tias cov ntaub ntawv kaw lus tsis siv los ntawm leej twg thiab tuaj yeem tshem tawm siv cov lus txib lsof ua:
lsof +D /srv

Kuv kuj yuav tsum nres lub redis database, vim nws kuj siv "/srv". Tom ntej no kuv unmounted / srv (yuav).

Cov ntaub ntawv kaw lus raug kuaj xyuas siv cov khoom siv hluav taws xob e2fck ua nrog hloov -f (quab yuam xyuas txawm tias filesystem raug cim kom huv):

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

Tom ntej no, siv lub tshuab hluav taws xob puv 2fs (sudo dumpe2fs /dev/mapper/gu2—sys-srv | grep kuaj) koj tuaj yeem txheeb xyuas tias daim tshev tau ua tiav:

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

e2fck ua hais tias tsis muaj teeb meem pom nyob rau ntawm ext4 cov ntaub ntawv kaw lus, uas txhais tau hais tias koj tuaj yeem txuas ntxiv sim rov qab cov ntaub ntawv, lossis rov qab mus rau lub tshuab nqus tsev puv (ntawm chav kawm, koj yuav tsum tau mount cov ntaub ntawv system rov qab thiab pib lub database).

Yog tias koj muaj lub cev server, nco ntsoov xyuas cov xwm txheej ntawm cov disks (ntawm smartctl -a /dev/XXX) lossis RAID maub los kom paub tseeb tias qhov teeb meem tsis nyob rau theem kho vajtse. Hauv kuv qhov xwm txheej, RAID tau dhau los ua "hardware", yog li kuv hais kom tus thawj tswj hwm hauv zos saib xyuas qhov xwm txheej ntawm RAID (tus neeg rau zaub mov nyob ntau pua mais deb ntawm kuv). Nws hais tias tsis muaj qhov yuam kev, uas txhais tau hais tias peb tuaj yeem pib kho dua tshiab.

Sim 1: zero_damaged_pages

Peb txuas mus rau lub database ntawm psql nrog ib tug account uas muaj superuser txoj cai. Peb xav tau tus superuser, vim ... kev xaiv zero_damaged_pages tsuas yog nws hloov tau. Hauv kuv qhov xwm txheej nws yog postgres:

psql -h 127.0.0.1 -U postgres -s [database_name]

Xaiv zero_damaged_pages xav tau nyob rau hauv thiaj li yuav tsis quav ntsej nyeem yuam kev (los ntawm lub website postgrespro):

Thaum PostgreSQL pom cov nplooj ntawv tsis raug, nws feem ntau qhia txog qhov yuam kev thiab tshem tawm qhov kev hloov pauv tam sim no. Yog tias zero_damaged_pages tau qhib, lub kaw lus hloov qhov teeb meem ceeb toom, zeros tawm nplooj ntawv puas hauv lub cim xeeb, thiab txuas ntxiv ua haujlwm. Qhov kev coj cwj pwm no rhuav tshem cov ntaub ntawv, uas yog tag nrho cov kab hauv nplooj ntawv puas.

Peb pab txoj kev xaiv thiab sim ua kom tiav lub tshuab nqus tsev ntawm cov ntxhuav:

VACUUM FULL VERBOSE

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)
Hmoov tsis zoo, hmoov tsis zoo.

Peb ntsib qhov yuam kev zoo sib xws:

INFO: vacuuming "“public.ws_log_smevlog”
WARNING: invalid page in block 4123007 of relation base/16400/21396989; zeroing out page
ERROR: unexpected chunk number 573 (expected 565) for toast value 21648541 in pg_toast_106070

pg_taj - lub tshuab khaws cia "cov ntaub ntawv ntev" hauv Poetgres yog tias nws tsis haum rau ib nplooj ntawv (8kb los ntawm lub neej ntawd).

Kev sim 2: reindex

Thawj cov lus qhia los ntawm Google tsis pab. Tom qab ob peb feeb ntawm kev tshawb nrhiav, kuv pom cov lus qhia thib ob - ua reindex lub rooj puas. Kuv pom cov lus qhia no hauv ntau qhov chaw, tab sis nws tsis txhawb kev ntseeg siab. Wb reindex:

reindex table ws_log_smevlog

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

reindex ua tiav yam tsis muaj teeb meem.

Txawm li cas los xij, qhov no tsis pab, VACUUM FULL tsoo nrog ib qho kev ua yuam kev zoo sib xws. Txij li thaum kuv tau siv los ua qhov ua tsis tiav, kuv pib saib ntxiv rau cov lus qhia hauv Is Taws Nem thiab tuaj hla qhov nthuav dav tsab xov xwm.

Kev sim 3: SELECT, LIMIT, OFFSET

Cov kab lus saum toj no tau pom zoo saib ntawm kab lus los ntawm kab thiab tshem tawm cov ntaub ntawv teeb meem. Ua ntej peb yuav tsum saib tag nrho cov kab:

for ((i=0; i<"Number_of_rows_in_nodes"; i++ )); do psql -U "Username" "Database Name" -c "SELECT * FROM nodes LIMIT 1 offset $i" >/dev/null || echo $i; done

Hauv kuv rooj plaub, lub rooj muaj 1 628 991 kab! Nws yog ib qho tsim nyog yuav tsum tau saib xyuas zoo kev faib cov ntaub ntawv, tab sis qhov no yog lub ntsiab lus rau kev sib tham sib cais. Nws yog hnub Saturday, kuv tau khiav cov lus txib hauv tmux thiab mus pw:

for ((i=0; i<1628991; i++ )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog LIMIT 1 offset $i" >/dev/null || echo $i; done

Thaum sawv ntxov kuv txiav txim siab los xyuas seb yam twg mus li cas. Ua rau kuv xav tsis thoob, kuv pom tias tom qab 20 teev, tsuas yog 2% ntawm cov ntaub ntawv tau raug luam tawm! Kuv tsis xav tos 50 hnub. Lwm qhov ua tsis tiav.

Tab sis kuv tsis tso tseg. Kuv xav tsis thoob vim li cas lub scan siv sij hawm ntev heev. Los ntawm cov ntaub ntawv (dua ntawm postgrespro) Kuv pom tias:

OFFSET qhia kom hla tus naj npawb ntawm kab ua ntej pib tso tawm kab.
Yog tias ob qho tib si OFFSET thiab LIMIT tau teev tseg, lub kaw lus thawj zaug hla OFFSET kab thiab tom qab ntawd pib suav cov kab rau LIMIT txwv.

Thaum siv LIMIT, nws tseem ceeb heev uas yuav tsum tau siv ORDER BY clause kom cov kab tshwm sim tau xa rov qab rau hauv ib qho kev txiav txim tshwj xeeb. Txwv tsis pub, cov subsets ntawm cov kab yuav raug xa rov qab.

Obviously, cov lus txib saum toj no tsis raug: ua ntej, tsis muaj xaj los ntawm, qhov tshwm sim tuaj yeem ua yuam kev. Qhov thib ob, Postgres thawj zaug yuav tsum luam theej duab thiab hla OFFSET kab, thiab nrog nce OFFSET productivity yuav poob txawm ntxiv.

Sim 4: muab pov tseg hauv daim ntawv ntawv

Tom qab ntawd ib lub tswv yim zoo li ci ntsa iab tuaj rau hauv kuv lub siab: nqa ib daim ntawv pov tseg hauv daim ntawv thiab tshuaj xyuas kab ntawv sau kawg.

Tab sis ua ntej, cia peb saib cov qauv ntawm lub rooj. ws_log_smevlog:

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

Nyob rau hauv peb rooj plaub peb muaj ib kem "Id", uas muaj cov cim tshwj xeeb (counter) ntawm kab. Txoj kev npaj tau zoo li no:

  1. Peb pib muab pov tseg hauv daim ntawv (hauv daim ntawv ntawm sql commands)
  2. Nyob rau qee lub sijhawm, cov pob tseg yuav cuam tshuam vim qhov yuam kev, tab sis cov ntawv nyeem tseem yuav raug cawm hauv disk.
  3. Peb saib qhov kawg ntawm cov ntawv nyeem, yog li peb pom tus cim (id) ntawm kab kawg uas tau muab tshem tawm ua tiav

Kuv pib noj cov pob tseg hauv daim ntawv:

pg_dump -U my_user -d my_database -F p -t ws_log_smevlog -f ./my_dump.dump

Lub pob tseg, raws li xav tau, tau cuam tshuam nrog tib qhov yuam kev:

pg_dump: Error message from server: ERROR: invalid page in block 4123007 of relatton base/16490/21396989

Ntxiv mus Tail Kuv ntsia qhov kawg ntawm qhov pov tseg (tail -5 ./my_dump.dump) pom tias cov pob tseg tau cuam tshuam ntawm kab nrog id 186 525. "Yog li qhov teeb meem nyob hauv kab nrog id 186 526, nws tau tawg, thiab yuav tsum tau muab tshem tawm!" – Kuv xav. Tab sis, ua lus nug rau lub database:
«xaiv * los ntawm ws_log_smevlog qhov twg id=186529"Nws tau pom tias txhua yam ua tau zoo nrog cov kab no ... Kab nrog cov ntsuas 186 - 530 kuj ua haujlwm yam tsis muaj teeb meem. Lwm "lub tswv yim ci ntsa iab" ua tsis tiav. Tom qab ntawd kuv nkag siab tias yog vim li cas qhov no tshwm sim: thaum rho tawm thiab hloov cov ntaub ntawv los ntawm lub rooj, lawv tsis raug tshem tawm ntawm lub cev, tab sis raug cim tias "tuag tuples", tom qab ntawd los autovacuum thiab kos cov kab no li deleted thiab tso cai rau cov kab no rov qab siv dua. Txhawm rau nkag siab, yog tias cov ntaub ntawv hauv lub rooj hloov pauv thiab autovacuum tau qhib, ces nws yuav tsis khaws cia ua ntu zus.

Sim 5: XAIV, LOS NTAWM, IB TUG =

Kev poob qis ua rau peb muaj zog dua. Koj yuav tsum tsis txhob tso tseg, koj yuav tsum mus rau qhov kawg thiab ntseeg koj tus kheej thiab koj lub peev xwm. Yog li kuv txiav txim siab sim lwm txoj kev xaiv: tsuas yog saib tag nrho cov ntaub ntawv hauv cov ntaub ntawv ib los ntawm ib qho. Paub txog cov qauv ntawm kuv lub rooj (saib saum toj no), peb muaj ib qho id teb uas yog qhov tshwj xeeb (tus yuam sij tseem ceeb). Peb muaj 1 kab hauv lub rooj thiab id yog nyob rau hauv kev txiav txim, uas txhais tau hais tias peb yuav cia li mus dhau lawv ib tug los ntawm ib tug:

for ((i=1; i<1628991; i=$((i+1)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done

Yog tias leej twg tsis nkag siab, cov lus txib ua haujlwm raws li hauv qab no: nws scans kab lus los ntawm kab thiab xa stdout mus rau / dev / thov, tab sis yog tias cov lus txib SELECT ua tsis tiav, ces cov ntawv nyeem yuam kev tau luam tawm (stderr raug xa mus rau lub console) thiab ib kab uas muaj qhov yuam kev tau luam tawm (ua tsaug rau ||, uas txhais tau hais tias qhov kev xaiv muaj teeb meem (tus lej rov qab ntawm cov lus txib. tsis yog 0)).

Kuv muaj hmoo, kuv muaj indexes tsim rau ntawm daim teb id:

Kuv thawj qhov kev paub rov qab los ntawm Postgres database tom qab ua tsis tiav (nplooj ntawv tsis raug hauv thaiv 4123007 ntawm relatton puag / 16490)

Qhov no txhais tau hais tias nrhiav ib txoj kab nrog tus id xav tau yuav tsum tsis txhob siv sijhawm ntau. Hauv kev xav nws yuav tsum ua haujlwm. Zoo, cia peb khiav cov lus txib hauv Tmux thiab cia peb mus pw.

Txog thaum sawv ntxov kuv pom tias kwv yees li 90 qhov nkag tau pom, uas tsuas yog ntau dua 000%. Ib qho txiaj ntsig zoo thaum piv nrog rau txoj kev dhau los (5%)! Tab sis kuv tsis xav tos 2 hnub ...

Kev sim 6: SELECT, NTAWM, NTAWM NO >= thiab id

Cov neeg siv khoom muaj lub server zoo heev rau lub database: dual-processor Intel Xeon E5-2697 v2, muaj ntau txog 48 xov hauv peb qhov chaw! Kev thauj khoom ntawm tus neeg rau zaub mov yog qhov nruab nrab; peb tuaj yeem rub tawm txog 20 xov yam tsis muaj teeb meem. Kuj tseem muaj RAM txaus: ntau npaum li 384 gigabytes!

Yog li ntawd, cov lus txib yuav tsum tau ua parallelized:

for ((i=1; i<1628991; i=$((i+1)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done

Ntawm no nws muaj peev xwm sau ib tsab ntawv zoo nkauj thiab zoo nkauj, tab sis kuv xaiv txoj kev sib tw ceev tshaj plaws: manually faib qhov ntau ntawm 0-1628991 rau hauv ntu ntawm 100 cov ntaub ntawv thiab khiav cais 000 cov lus txib ntawm daim ntawv:

for ((i=N; i<M; i=$((i+1)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done

Tab sis tsis yog tag nrho. Hauv txoj kev xav, kev txuas mus rau ib lub database kuj yuav siv sij hawm qee lub sij hawm thiab cov peev txheej. Txuas 1 tsis yog ntse heev, koj yuav pom zoo. Yog li ntawd, cia peb muab 628 kab es tsis txhob ntawm ib qho ntawm ib qho kev sib txuas. Raws li qhov tshwm sim, pab pawg tau hloov mus rau hauv qhov no:

for ((i=N; i<M; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done

Qhib 16 qhov rais hauv qhov kev sib tham tmux thiab khiav cov lus txib:

1) for ((i=0; i<100000; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done
2) for ((i=100000; i<200000; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done
…
15) for ((i=1400000; i<1500000; i=$((i+1000)) )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done
16) for ((i=1500000; i<1628991; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done

Ib hnub tom qab kuv tau txais thawj zaug! Namely (tus nqi XXX thiab ZZZ tsis tau khaws cia):

ERROR:  missing chunk number 0 for toast value 37837571 in pg_toast_106070
829000
ERROR:  missing chunk number 0 for toast value XXX in pg_toast_106070
829000
ERROR:  missing chunk number 0 for toast value ZZZ in pg_toast_106070
146000

Qhov no txhais tau hais tias peb kab muaj qhov yuam kev. Cov ids ntawm thawj thiab thib ob cov ntaub ntawv teeb meem yog nyob nruab nrab ntawm 829 thiab 000, ids ntawm peb yog nyob nruab nrab ntawm 830 thiab 000. Tom ntej no, peb tsuas yog yuav tsum nrhiav qhov tseeb id tus nqi ntawm cov ntaub ntawv teeb meem. Txhawm rau ua qhov no, peb saib hauv peb qhov ntau yam nrog cov ntaub ntawv teeb meem nrog qib 146 thiab txheeb xyuas tus ID:

for ((i=829000; i<830000; i=$((i+1)) )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done
829417
ERROR:  unexpected chunk number 2 (expected 0) for toast value 37837843 in pg_toast_106070
829449
for ((i=146000; i<147000; i=$((i+1)) )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done
829417
ERROR:  unexpected chunk number ZZZ (expected 0) for toast value XXX in pg_toast_106070
146911

Zoo siab xaus

Peb pom cov kab teeb meem. Peb mus rau hauv lub database ntawm psql thiab sim rho tawm lawv:

my_database=# delete from ws_log_smevlog where id=829417;
DELETE 1
my_database=# delete from ws_log_smevlog where id=829449;
DELETE 1
my_database=# delete from ws_log_smevlog where id=146911;
DELETE 1

Ua rau kuv xav tsis thoob, cov kev nkag tau raug tshem tawm yam tsis muaj teeb meem txawm tias tsis muaj kev xaiv zero_damaged_pages.

Ces kuv txuas mus rau lub database, puas VACUUM FULL (Kuv xav tias nws tsis tsim nyog ua qhov no), thiab thaum kawg kuv ua tiav tshem tawm cov thaub qab siv pg_dub. Cov pob tseg tau raug coj los tsis muaj qhov yuam kev! Qhov teeb meem raug daws raws li txoj kev ruam. Kev xyiv fab paub tsis muaj kev cia siab, tom qab ntau qhov kev ua tsis tiav peb tau nrhiav kev daws teeb meem!

Kev lees paub thiab lus xaus

Qhov no yog li cas kuv thawj qhov kev paub ntawm kev rov qab kho qhov tseeb Postgres database tau tawm. Kuv yuav nco ntsoov qhov kev paub no tau ntev.

Thiab thaum kawg, kuv xav hais ua tsaug rau PostgresPro rau kev txhais cov ntaub ntawv ua lus Lavxias thiab rau cov chav kawm online dawb kiag li, uas tau pab ntau heev thaum lub sij hawm tsom xam ntawm qhov teeb meem.

Tau qhov twg los: www.hab.com

Ntxiv ib saib