ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

Makemake au e kaʻana like me ʻoe i kaʻu ʻike holomua mua o ka hoʻihoʻi ʻana i kahi waihona Postgres i ka hana piha. Ua kamaʻāina wau i ka Postgres DBMS i ka hapalua makahiki i hala; ma mua o kēlā ʻaʻohe oʻu ʻike i ka hoʻokele waihona.

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

Hana wau ma ke ʻano he ʻenekini semi-DevOps ma kahi hui IT nui. Hoʻokumu kā mākou hui i nā polokalamu no nā lawelawe haʻahaʻa kiʻekiʻe, a naʻu ke kuleana no ka hana, mālama a me ka hoʻolaha. Ua hāʻawi ʻia iaʻu kahi hana maʻamau: e hoʻonui i kahi noi ma hoʻokahi kikowaena. Ua kākau ʻia ka palapala noi ma Django, i ka wā e hana ʻia ai ka neʻe ʻana o ka neʻe ʻana (nā hoʻololi i ka ʻōnaehana waihona), a ma mua o kēia kaʻina lawe mākou i kahi waihona waihona piha ma o ka papahana pg_dump maʻamau, i ka hihia.

Ua loaʻa kahi hewa i manaʻo ʻole ʻia i ka wā e lawe ana i kahi pahu (Postgres version 9.5):

pg_dump: Oumping the contents of table “ws_log_smevlog” failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: invalid page in block 4123007 of relatton base/16490/21396989
pg_dump: The command was: COPY public.ws_log_smevlog [...]
pg_dunp: [parallel archtver] a worker process dled unexpectedly

kikowaena "ʻaoʻao kūpono ʻole ma ka poloka" ʻōlelo ʻo ia i nā pilikia ma ka pae ʻōnaehana faila, he ʻino loa. Ma nā ʻaha kūkā like ʻole ua manaʻo ʻia e hana PAHA PAHA me ke koho zero_damaged_pages e hoopau i keia pilikia. ʻAe, e hoʻāʻo kāua...

Hoʻomākaukau no ka hoʻōla

NĀHUI! E mālama pono i kahi kākoʻo Postgres ma mua o ka hoʻāʻo ʻana e hoʻihoʻi i kāu waihona. Inā loaʻa iā ʻoe kahi mīkini virtual, e kāpae i ka waihona a lawe i kahi kiʻi. Inā ʻaʻole hiki ke lawe i kahi kiʻi, e kāpae i ka waihona a kope i nā mea o ka papa kuhikuhi Postgres (me nā faila wal) i kahi palekana. ʻO ka mea nui i kā mākou ʻoihana ʻaʻole ia e hōʻino i nā mea. Heluhelu ia.

Ma muli o ka hana maʻamau o ka waihona noʻu, ua kaupalena wau iaʻu iho i kahi hoʻolei waihona waihona maʻamau, akā haʻalele i ka papaʻaina me nā ʻikepili pōʻino (koho -T, --exclude-papa = PAPA i ka pg_dump).

He kino ke kikowaena, ʻaʻole hiki ke lawe i kahi kiʻi. Ua wehe ʻia ka waihona, e neʻe kākou.

Nānā pūnaewele waihona

Ma mua o ka hoʻāʻo ʻana e hoʻihoʻi i ka waihona, pono mākou e hōʻoia i ka pololei o nā mea āpau me ka ʻōnaehana faila ponoʻī. A inā he hewa, e hoʻoponopono iā lākou, no ka mea inā ʻaʻole hiki iā ʻoe ke hoʻonui i nā mea.

I koʻu hihia, ua kau ʻia ka ʻōnaehana faila me ka waihona "/srv" a ʻo ke ʻano he ext4.

Ke hooki nei i ka waihona: systemctl kū [pale ʻia ka leka uila] a nānā ʻaʻole hoʻohana ʻia ka ʻōnaehana faila e kekahi a hiki ke wehe ʻia me ke kauoha lsof:
lsof +D /srv

Pono wau e hooki i ka waihona redis, no ka mea, hoʻohana pū ia "/srv". A laila wehe au / srv (umount).

Ua nānā ʻia ka ʻōnaehana faila me ka hoʻohana ʻana i ka pono e2fsck me ka hoʻololi -f (ʻO ka hoʻopaʻa ikaika ʻana inā paha i hōʻailona ʻia maʻemaʻe ka ʻōnaehana waihona):

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

A laila, e hoʻohana i ka pono dumpe2fs (sudo dumpe2fs /dev/mapper/gu2—sys-srv | Ua nānā ʻia ʻo grep) hiki iā ʻoe ke hōʻoia i ka hana maoli ʻana o ka loiloi:

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

e2fsck ʻōlelo ʻia ʻaʻohe pilikia i loaʻa ma ka pae ʻōnaehana file ext4, ʻo ia ka mea hiki iā ʻoe ke hoʻomau i ka hoʻāʻo e hoʻihoʻi i ka waihona, a i ʻole e hoʻi i ʻūhā piha (ʻoiaʻiʻo, pono ʻoe e kau i ka ʻōnaehana faila a hoʻomaka i ka waihona).

Inā he kikowaena kino kāu, e nānā pono i ke kūlana o nā disks (via smartctl -a /dev/XXX) a i ʻole ka mea hoʻoponopono RAID e hōʻoia i ka pilikia ʻaʻole i ka pae lako. I koʻu hihia, ua lilo ka RAID i "mea paʻahana", no laila ua noi au i ke alakaʻi kūloko e nānā i ke kūlana o ka RAID (ʻo ke kikowaena he mau haneli mau kilomita mai iaʻu). Ua ʻōlelo ʻo ia ʻaʻohe hewa, ʻo ia ka mea hiki iā mākou ke hoʻomaka i ka hoʻihoʻi.

Hoao 1: zero_damaged_pages

Hoʻopili mākou i ka waihona ma o psql me kahi moʻokāki nona nā kuleana superuser. Pono mākou i kahi superuser, no ka mea ... koho zero_damaged_pages ʻo ia wale nō ka mea hiki ke hoʻololi. I koʻu hihia he postgres:

psql -h 127.0.0.1 -U postgres -s [inoa_ikepili]

Koho zero_damaged_pages pono e haʻalele i ka heluhelu hewa (mai ka pūnaewele postgrespro):

Ke ʻike ʻo PostgreSQL i kahi poʻomanaʻo ʻaoʻao ʻino, hōʻike maʻamau ia i kahi hewa a hoʻopau i ke kālepa o kēia manawa. Inā ʻae ʻia ka zero_damaged_pages, hoʻopuka ka ʻōnaehana i kahi ʻōlelo aʻo, hoʻopau i ka ʻaoʻao i poino i ka hoʻomanaʻo, a hoʻomau i ka hana. Hoʻopau kēia ʻano i ka ʻikepili, ʻo ia hoʻi nā lālani a pau o ka ʻaoʻao i poino.

Hāʻawi mākou i ke koho a hoʻāʻo e hana i kahi pahu piha piha o nā papa:

VACUUM FULL VERBOSE

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)
ʻO ka pōʻino, pōʻino.

Ua loaʻa iā mākou kahi hewa like:

INFO: vacuuming "“public.ws_log_smevlog”
WARNING: invalid page in block 4123007 of relation base/16400/21396989; zeroing out page
ERROR: unexpected chunk number 573 (expected 565) for toast value 21648541 in pg_toast_106070

pg_toast - he mīkini no ka mālama ʻana i nā "ʻikepili lōʻihi" ma Poetgres inā ʻaʻole kūpono ia ma ka ʻaoʻao hoʻokahi (8kb ma ka paʻamau).

Ho'āʻo 2: reindex

ʻAʻole kōkua ka ʻōlelo aʻo mua mai Google. Ma hope o kekahi mau minuke o ka ʻimi ʻana, loaʻa iaʻu ka ʻōlelo aʻoaʻo ʻelua - e hana reindex papaʻaina pōʻino. Ua ʻike au i kēia ʻōlelo aʻo ma nā wahi he nui, akā ʻaʻole ia i hoʻoulu i ka hilinaʻi. E helu hou kākou:

reindex table ws_log_smevlog

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

reindex pau me ka pilikia ole.

ʻAʻole naʻe i kōkua kēia, PAHA PAHA hāʻule me ka hewa like. No ka mea ua maʻa wau i nā hemahema, hoʻomaka wau e ʻimi hou aku no ka ʻōlelo aʻo ma ka Pūnaewele a loaʻa iaʻu kahi mea hoihoi ʻO kahi mea kākau.

Ho'āʻo 3: SELECT, LIMIT, OFFSET

Manaʻo ka ʻatikala ma luna e nānā i ka lālani papa ma ka lālani a wehe i ka ʻikepili pilikia. Pono mua mākou e nānā i nā laina a pau:

for ((i=0; i<"Number_of_rows_in_nodes"; i++ )); do psql -U "Username" "Database Name" -c "SELECT * FROM nodes LIMIT 1 offset $i" >/dev/null || echo $i; done

I koʻu hihia, aia ka papaʻaina 1 628 991 laina! Ua pono e malama pono ʻāpana ʻikepili, akā, he kumuhana kēia no ke kūkākūkā kaʻawale. ʻO ka Pōʻaono ia, holo wau i kēia kauoha ma tmux a moe:

for ((i=0; i<1628991; i++ )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog LIMIT 1 offset $i" >/dev/null || echo $i; done

Ma ke kakahiaka ua hoʻoholo wau e nānā i ke ʻano o nā mea. I koʻu pīhoihoi, ʻike wau ma hope o 20 mau hola, ʻo 2% wale nō o ka ʻikepili i nānā ʻia! ʻAʻole au makemake e kali i 50 lā. ʻO kekahi hemahema piha.

Akā ʻaʻole wau i hāʻawi. Ua noʻonoʻo wau no ke aha i lōʻihi ai ka nānā ʻana. Mai ka palapala (hou ma postgrespro) ʻike wau:

Hoʻomaopopo ʻo OFFSET e lele i ka helu o nā lālani ma mua o ka hoʻomaka ʻana e hoʻopuka i nā lālani.
Inā ʻōlelo ʻia ʻelua OFFSET a me LIMIT, lele mua ka ʻōnaehana i nā lālani OFFSET a laila hoʻomaka e helu i nā lālani no ka palena LIMIT.

Ke hoʻohana nei i ka LIMIT, he mea nui e hoʻohana i kahi paukū ORDER BY i mea e hoʻihoʻi ʻia ai nā lālani hopena i kahi kauoha kikoʻī. A i ʻole, e hoʻihoʻi ʻia nā subset hiki ʻole o nā lālani.

ʻIke loa, ua hewa ke kauoha i luna: ʻo ka mea mua, ʻaʻohe kauoha e, he hewa paha ka hopena. ʻO ka lua, pono ʻo Postgres e nānā a hoʻokuʻu i nā lālani OFFSET, a me ka hoʻonui KAHELE e emi hou ana ka huahana.

Ho'āʻo 4: lawe i kahi hoʻolei ma ke ʻano kikokikona

A laila hiki mai ka manaʻo maikaʻi loa i koʻu noʻonoʻo: e lawe i kahi pahu ma ke ʻano kikokikona a nānā i ka laina hope i hoʻopaʻa ʻia.

Akā ʻo ka mua, e nānā kākou i ke ʻano o ka papaʻaina. ws_log_smevlog:

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

I kā mākou hihia he kolamu "Id", i loaʻa ka mea hōʻike kūʻokoʻa (counter) o ka lālani. Ua like ka papahana penei:

  1. Hoʻomaka mākou e lawe i kahi dump ma ke ʻano kikokikona (ma ke ʻano o nā kauoha sql)
  2. I kekahi manawa, e hoʻopau ʻia ka dump ma muli o kahi hewa, akā e mālama mau ʻia ka faila kikokikona ma ka disk.
  3. Nānā mākou i ka hope o ka faila kikokikona, a laila ʻike mākou i ka mea hōʻike (id) o ka laina hope i wehe maikaʻi ʻia.

Ua hoʻomaka wau e lawe i kahi dump ma ke ʻano kikokikona:

pg_dump -U my_user -d my_database -F p -t ws_log_smevlog -f ./my_dump.dump

ʻO ka dump, e like me ka mea i manaʻo ʻia, ua hoʻopau ʻia me ka hewa like:

pg_dump: Error message from server: ERROR: invalid page in block 4123007 of relatton base/16490/21396989

ʻOi aku ma o ka huelo Ua nānā au i ka hope o ka lua (huelo -5 ./my_dump.dump) ua ʻike ʻia ua hoʻopau ʻia ka dump ma ka laina me ka id 186 525. "No laila aia ka pilikia ma ka laina me ka id 186 526, ua haki, a pono e holoi ʻia!" - Ua manaʻo wau. Akā, e hana ana i kahi nīnau i ka waihona:
«koho * mai ws_log_smevlog kahi id=186529"Ua hoʻololi ʻia ua maikaʻi nā mea āpau me kēia laina ... Ua hana pū nā lālani me nā helu 186 - 530 me ka pilikia ʻole. Ua hāʻule kekahi "manaʻo maikaʻi". Ma hope mai ua maopopo iaʻu ke kumu o kēia: i ka holoi ʻana a me ka hoʻololi ʻana i ka ʻikepili mai kahi papaʻaina, ʻaʻole i holoi ʻia ke kino, akā ua hōʻailona ʻia ʻo ia he "tuples make", a laila hele mai. autovacuum a hōʻailona i kēia mau laina i holoi ʻia a hiki ke hoʻohana hou ʻia kēia mau laina. No ka hoʻomaopopo ʻana, inā hoʻololi ka ʻikepili i ka papaʻaina a hiki ke hoʻohana ʻia ka autovacuum, a laila ʻaʻole mālama ʻia i ka sequentially.

Hoao 5: KOHO, MAI, WHERE id=

ʻO nā hemahema e ikaika ai mākou. ʻAʻole ʻoe e haʻalele, pono ʻoe e hele i ka hopena a manaʻoʻiʻo iā ʻoe iho a me kou hiki. No laila, ua hoʻoholo wau e hoʻāʻo i kahi koho ʻē aʻe: e nānā wale i nā moʻolelo a pau i loko o ka waihona i kēlā me kēia. I ka ʻike ʻana i ke ʻano o kaʻu papaʻaina (e ʻike i luna), loaʻa iā mākou kahi kahua id ʻokoʻa (kī kumu). Loaʻa iā mākou nā lālani 1 i ka papaʻaina a id ma ke ʻano, ʻo ia hoʻi, hiki iā mākou ke hele i kēlā me kēia:

for ((i=1; i<1628991; i=$((i+1)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done

Inā ʻaʻole maopopo i kekahi, hana ke kauoha penei: nānā ia i ka lālani papa ma ka lālani a hoʻouna iā stdout i / dev / null, akā inā hāʻule ke kauoha SELECT, a laila paʻi ʻia ka kikokikona hewa (ua hoʻouna ʻia ʻo stderr i ka console) a paʻi ʻia kahi laina i loaʻa ka hewa (e hoʻomaikaʻi iā ||, ʻo ia hoʻi ua pilikia ka mea koho (ke code hoʻihoʻi o ke kauoha. ʻaʻole 0)).

Laki wau, ua hana au i nā kuhikuhi ma ke kahua id:

ʻO kaʻu ʻike mua i ka hoʻihoʻi ʻana i kahi waihona Postgres ma hope o ka hāʻule ʻana (ʻaoʻao kūpono ʻole i ka poloka 4123007 o relatton base/16490)

ʻO ia ke ʻano o ka loaʻa ʻana o kahi laina me ka id i makemake ʻia ʻaʻole pono e lōʻihi ka manawa. Ma ke kumumanaʻo pono e hana. ʻAe, e holo kāua i ke kauoha tmux a e moe kaua.

Ma ke kakahiaka ua ʻike au ma kahi o 90 mau helu i nānā ʻia, ʻoi aku ka nui o 000%. He hopena maikaʻi loa ke hoʻohālikelike ʻia me ke ʻano mua (5%)! Akā ʻaʻole wau makemake e kali i nā lā 2 ...

Ho'āʻo 6: SELECT, FROM, WHERE id >= and id

He kikowaena maikaʻi loa ka mea kūʻai aku i ka waihona: dual-processor Intel Xeon E5-2697 v2, aia ma kahi o 48 mau kaula ma ko mākou wahi! He awelika ka ukana ma ke kikowaena; hiki iā mākou ke hoʻoiho ma kahi o 20 mau kaula me ka pilikia ʻole. Ua lawa ka RAM: e like me 384 gigabytes!

No laila, pono e hoʻohālikelike ʻia ke kauoha:

for ((i=1; i<1628991; i=$((i+1)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done

Maʻaneʻi hiki ke kākau i kahi palapala nani a nani, akā ua koho au i ke ala hoʻohālikelike wikiwiki loa: hoʻokaʻawale lima i ka laulā 0-1628991 i loko o nā manawa o 100 mau moʻolelo a holo kaʻawale i nā kauoha 000 o ke ʻano:

for ((i=N; i<M; i=$((i+1)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done

Akā ʻaʻole ʻo ia wale nō. Ma ke kumumanaʻo, ʻo ka hoʻopili ʻana i kahi ʻikepili e lawe i kekahi manawa a me nā kumuwaiwai ʻōnaehana. ʻAʻole akamai loa ka hoʻohui ʻana i 1, e ʻae ʻoe. No laila, e kiʻi kākou i nā lālani 628 ma kahi o hoʻokahi pili. ʻO ka hopena, ua hoʻololi ka hui i kēia:

for ((i=N; i<M; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done

E wehe i 16 windows i kahi hālāwai tmux a holo i nā kauoha:

1) for ((i=0; i<100000; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done
2) for ((i=100000; i<200000; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done
…
15) for ((i=1400000; i<1500000; i=$((i+1000)) )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done
16) for ((i=1500000; i<1628991; i=$((i+1000)) )); do psql -U my_user -d my_database  -c "SELECT * FROM ws_log_smevlog where id>=$i and id<$((i+1000))" >/dev/null || echo $i; done

I kekahi lā ma hope ua loaʻa iaʻu nā hopena mua! ʻO ia (ʻaʻole mālama ʻia nā waiwai XXX a me ZZZ):

ERROR:  missing chunk number 0 for toast value 37837571 in pg_toast_106070
829000
ERROR:  missing chunk number 0 for toast value XXX in pg_toast_106070
829000
ERROR:  missing chunk number 0 for toast value ZZZ in pg_toast_106070
146000

ʻO ia hoʻi, aia nā laina ʻekolu i kahi hewa. ʻO nā id o nā moʻolelo pilikia mua a me ka lua ma waena o 829 a me 000, nā id o ke kolu ma waena o 830 a me 000. A laila, pono mākou e ʻimi i ka waiwai id pololei o nā moʻolelo pilikia. No ka hana ʻana i kēia, nānā mākou i kā mākou kikowaena me nā moʻolelo pilikia me kahi pae o 146 a ʻike i ka id:

for ((i=829000; i<830000; i=$((i+1)) )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done
829417
ERROR:  unexpected chunk number 2 (expected 0) for toast value 37837843 in pg_toast_106070
829449
for ((i=146000; i<147000; i=$((i+1)) )); do psql -U my_user -d my_database -c "SELECT * FROM ws_log_smevlog where id=$i" >/dev/null || echo $i; done
829417
ERROR:  unexpected chunk number ZZZ (expected 0) for toast value XXX in pg_toast_106070
146911

Hauʻoli hauʻoli

Ua ʻike mākou i nā laina pilikia. Hele mākou i ka waihona ma o psql a hoʻāʻo e holoi iā lākou:

my_database=# delete from ws_log_smevlog where id=829417;
DELETE 1
my_database=# delete from ws_log_smevlog where id=829449;
DELETE 1
my_database=# delete from ws_log_smevlog where id=146911;
DELETE 1

ʻO koʻu kahaha, ua holoi ʻia nā mea komo me ka pilikia ʻole me ke koho ʻole zero_damaged_pages.

A laila, pili au i ka waihona, hana PAHA PAHA (Manaʻo wau ʻaʻole pono e hana i kēia), a i ka hopena ua hoʻopau maikaʻi wau i ka hoʻohana ʻana pg_puʻu. Ua lawe ʻia ka pahu me ka hewa ʻole! Ua hoʻoholo ʻia ka pilikia ma ke ʻano naʻaupō. ʻAʻohe palena o ka hauʻoli, ma hope o ka nui o nā hāʻule i hiki iā mākou ke loaʻa kahi hopena!

Hoʻomaikaʻi a me ka hopena

ʻO kēia ke ʻano o kaʻu ʻike mua o ka hoʻihoʻi ʻana i kahi waihona Postgres maoli. E hoʻomanaʻo wau i kēia ʻike no ka manawa lōʻihi.

A ʻo ka hope, makemake wau e ʻōlelo mahalo iā PostgresPro no ka unuhi ʻana i ka palapala i ka Lūkini a no nā haʻawina pūnaewele manuahi loa, i kōkua nui i ka wā o ka nānā ʻana i ka pilikia.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka