WAL-G: thaub qab thiab rov qab ntawm PostgreSQL DBMS

Nws tau ntev tau paub tias ua backups rau hauv SQL dumps (siv pg_dub los yog pg_dub) tsis yog ib lub tswv yim zoo. Txhawm rau thaub qab PostgreSQL DBMS, nws yog qhov zoo dua los siv cov lus txib pg_basebackup, uas ua rau binary luam ntawm WAL cav. Tab sis thaum koj pib kawm tag nrho cov txheej txheem ntawm kev tsim ib daim ntawv theej thiab rov kho dua, koj yuav nkag siab tias koj yuav tsum tau sau tsawg kawg ob peb lub tsheb kauj vab rau qhov no ua haujlwm thiab tsis ua rau koj mob ob sab saum toj thiab hauv qab. Txhawm rau txo kev txom nyem, WAL-G tau tsim.

WAL-G yog ib lub cuab yeej sau rau hauv Go rau thaub qab thiab rov kho PostgreSQL databases (thiab tsis ntev los no MySQL/MariaDB, MongoDB thiab FoundationDB). Nws txhawb kev ua haujlwm nrog Amazon S3 cia (thiab analogues, piv txwv li, Yandex Object Storage), nrog rau Google Cloud Storage, Azure Storage, Swift Object Storage thiab yooj yim nrog cov ntaub ntawv kaw lus. Tag nrho cov teeb tsa los mus rau cov kauj ruam yooj yim, tab sis vim qhov tseeb tias cov ntawv hais txog nws tau tawg thoob plaws hauv Is Taws Nem, tsis muaj qhov ua tiav phau ntawv qhia yuav ua li cas suav nrog tag nrho cov kauj ruam txij thaum pib mus txog qhov kawg (muaj ob peb nqe lus ntawm Habré, tab sis ntau lub ntsiab lus tsis nco qab lawm).

WAL-G: thaub qab thiab rov qab ntawm PostgreSQL DBMS

Kab lus no tau sau feem ntau los ua kom kuv txoj kev paub. Kuv tsis yog DBA thiab kuv tuaj yeem qhia kuv tus kheej hauv cov lus layman qhov chaw, yog li txais tos txhua qhov kev kho!

Cais, kuv nco ntsoov tias txhua yam hauv qab no yog qhov cuam tshuam thiab sim rau PostgreSQL 12.3 ntawm Ubuntu 18.04, tag nrho cov lus txib yuav tsum tau ua raws li tus neeg siv muaj cai.

chaw

Thaum lub sijhawm sau tsab xov xwm no, qhov ruaj khov version ntawm WAL-G yog v 0.2.15 (Lub Peb Hlis 2020). Nov yog qhov peb yuav siv (tab sis yog tias koj xav tsim nws tus kheej los ntawm tus tswv ceg, ces github repository muaj tag nrho cov lus qhia rau qhov no). Txhawm rau rub tawm thiab nruab koj yuav tsum ua:

#!/bin/bash

curl -L "https://github.com/wal-g/wal-g/releases/download/v0.2.15/wal-g.linux-amd64.tar.gz" -o "wal-g.linux-amd64.tar.gz"
tar -xzf wal-g.linux-amd64.tar.gz
mv wal-g /usr/local/bin/

Tom qab ntawd, koj yuav tsum teeb tsa WAL-G ua ntej, thiab tom qab ntawd PostgreSQL nws tus kheej.

Kev teeb tsa WAL-G

Rau ib qho piv txwv ntawm kev khaws cov thaub qab, Amazon S3 yuav raug siv (vim nws nyob ze rau kuv cov servers thiab nws siv yog pheej yig heev). Txhawm rau ua haujlwm nrog nws, koj xav tau "s3 thoob" thiab nkag mus rau cov yuam sij.

Tag nrho cov kab lus dhau los txog WAL-G siv kev teeb tsa siv ib puag ncig hloov pauv, tab sis nrog qhov kev tso tawm no cov chaw tuaj yeem nyob hauv .walg.json cov ntaub ntawv nyob rau hauv lub tsev directory ntawm tus neeg siv postgres. Txhawm rau tsim nws, khiav cov ntawv bash hauv qab no:

#!/bin/bash

cat > /var/lib/postgresql/.walg.json << EOF
{
    "WALG_S3_PREFIX": "s3://your_bucket/path",
    "AWS_ACCESS_KEY_ID": "key_id",
    "AWS_SECRET_ACCESS_KEY": "secret_key",
    "WALG_COMPRESSION_METHOD": "brotli",
    "WALG_DELTA_MAX_STEPS": "5",
    "PGDATA": "/var/lib/postgresql/12/main",
    "PGHOST": "/var/run/postgresql/.s.PGSQL.5432"
}
EOF
# обязательно меняем владельца файла:
chown postgres: /var/lib/postgresql/.walg.json

Cia kuv piav me ntsis txog tag nrho cov kev txwv:

  • WALG_S3_PREFIX - txoj hauv kev rau koj lub thoob S3 qhov twg cov thaub qab yuav raug xa tawm (koj tuaj yeem mus rau hauv paus lossis mus rau ib daim nplaub tshev);
  • AWS_ACCESS_KEY_ID - nkag mus rau hauv S3 (Nyob rau hauv rooj plaub ntawm kev rov qab los ntawm kev sim server, cov yuam sij no yuav tsum muaj Txoj Cai ReadOnly! Qhov no tau piav qhia ntau ntxiv hauv ntu ntawm kev rov qab los.);
  • AWS_SECRET_ACCESS_KEY - tus yuam sij zais cia hauv S3 cia;
  • WALG_COMPRESSION_METHOD - txoj kev compression, nws yog qhov zoo dua los siv Brotli (vim qhov no yog lub ntsiab lus kub ntawm qhov kawg loj thiab compression / decompression ceev);
  • WALG_DELTA_MAX_STEPS - tus naj npawb ntawm "deltas" ua ntej tsim cov thaub qab tag nrho (lawv txuag lub sijhawm thiab qhov loj ntawm cov ntaub ntawv rub tawm, tab sis tuaj yeem ua rau qeeb qeeb ntawm cov txheej txheem rov qab, yog li nws tsis pom zoo kom siv cov txiaj ntsig loj);
  • PGDATA - txoj hauv kev mus rau cov npe nrog koj cov ntaub ntawv database (koj tuaj yeem paub los ntawm kev khiav cov lus txib pg_lsclusters);
  • PGHOST - txuas mus rau lub database, nrog ib lub zos thaub qab nws yog qhov zoo dua los ua nws ntawm lub unix-socket li hauv qhov piv txwv no.

Lwm yam parameter tuaj yeem pom hauv cov ntaub ntawv: https://github.com/wal-g/wal-g/blob/v0.2.15/PostgreSQL.md#configuration.

Teeb tsa PostgreSQL

Txhawm rau kom cov ntaub ntawv khaws cia hauv cov ntaub ntawv khaws cia khaws WAL cov ntaub ntawv rau huab thiab rov qab los ntawm lawv (yog tias tsim nyog), koj yuav tsum teeb tsa ntau qhov tsis sib xws hauv cov ntaub ntawv teeb tsa. /etc/postgresql/12/main/postgresql.conf. Tsuas yog rau cov pib koj yuav tsum paub tseebtias tsis muaj ib qho ntawm cov chaw hauv qab no tau teeb tsa rau lwm qhov txiaj ntsig, yog li ntawd thaum lub teeb tsa rov qab, DBMS tsis poob. Koj tuaj yeem ntxiv cov ntsuas no siv:

#!/bin/bash

echo "wal_level=replica" >> /etc/postgresql/12/main/postgresql.conf
echo "archive_mode=on" >> /etc/postgresql/12/main/postgresql.conf
echo "archive_command='/usr/local/bin/wal-g wal-push "%p" >> /var/log/postgresql/archive_command.log 2>&1' " >> /etc/postgresql/12/main/postgresql.conf
echo “archive_timeout=60” >> /etc/postgresql/12/main/postgresql.conf
echo "restore_command='/usr/local/bin/wal-g wal-fetch "%f" "%p" >> /var/log/postgresql/restore_command.log 2>&1' " >> /etc/postgresql/12/main/postgresql.conf

# перезагружаем конфиг через отправку SIGHUP сигнала всем процессам БД
killall -s HUP postgres

Cov lus piav qhia ntawm cov parameter yuav tsum tau teeb tsa:

  • wal_level - Cov ntaub ntawv ntau npaum li cas los sau rau hauv WAL cav, "replica" - sau txhua yam;
  • archive_mode - pab rub tawm WAL cov ntaub ntawv siv cov lus txib los ntawm qhov ntsuas archive_command;
  • archive_command - hais kom ua archiving ua tiav WAL cav;
  • archive_timeout - archiving ntawm cav yog ua tsuas yog thaum nws ua tiav, tab sis yog tias koj lub server hloov / ntxiv cov ntaub ntawv me me rau hauv cov ntaub ntawv, ces nws ua rau kev txiav txim siab los teeb tsa qhov kev txwv ntawm no hauv vib nas this, tom qab ntawd cov lus txib archiving yuav raug hu ua quab yuam (Kuv sau ntau heev rau cov ntaub ntawv txhua ob, yog li kuv txiav txim siab tsis teeb qhov ntsuas no hauv kev tsim khoom);
  • restore_command - cov lus txib kom rov qab WAL log los ntawm kev thaub qab yuav raug siv yog tias "tag nrho thaub qab" (piv txwv li thaub qab) tsis muaj qhov hloov pauv tshiab hauv cov ntaub ntawv.

Koj tuaj yeem nyeem ntxiv txog tag nrho cov kev txwv no hauv kev txhais cov ntaub ntawv raug cai: https://postgrespro.ru/docs/postgresql/12/runtime-config-wal.

Teem lub sij hawm thaub qab

Txawm ib tug yuav hais li cas, txoj kev yooj yim tshaj plaws los khiav nws yog cron. Nov yog qhov peb yuav teeb tsa los tsim cov thaub qab. Cia peb pib nrog cov lus txib los tsim cov thaub qab tag nrho: hauv wal-g qhov no yog qhov kev sib cav pib thaub qab-push. Tab sis ua ntej, nws yog qhov zoo dua los khiav cov lus txib no manually los ntawm cov neeg siv postgres kom paub tseeb tias txhua yam zoo (thiab tsis muaj kev nkag mus yuam kev):

#!/bin/bash

su - postgres -c '/usr/local/bin/wal-g backup-push /var/lib/postgresql/12/main'

Cov kev sib cav pib qhia txoj hauv kev mus rau cov ntaub ntawv teev npe - Kuv ceeb toom koj tias koj tuaj yeem pom nws los ntawm kev khiav pg_lsclusters.

Yog tias txhua yam mus yam tsis muaj qhov yuam kev thiab cov ntaub ntawv tau thauj mus rau hauv S3 cia, tom qab ntawd koj tuaj yeem teeb tsa lub sijhawm tso tawm hauv crontab:

#!/bin/bash

echo "15 4 * * *    /usr/local/bin/wal-g backup-push /var/lib/postgresql/12/main >> /var/log/postgresql/walg_backup.log 2>&1" >> /var/spool/cron/crontabs/postgres
# задаем владельца и выставляем правильные права файлу
chown postgres: /var/spool/cron/crontabs/postgres
chmod 600 /var/spool/cron/crontabs/postgres

Hauv qhov piv txwv no, cov txheej txheem thaub qab pib txhua hnub thaum 4:15 teev sawv ntxov.

Rho tawm cov thaub qab qub

Feem ntau, koj tsis tas yuav khaws tag nrho cov ntaub ntawv thaub qab los ntawm Mesozoic era, yog li nws yuav muaj txiaj ntsig zoo rau "ntxuav" koj qhov chaw cia (ob qho tib si "tag nrho thaub qab" thiab WAL cav). Peb yuav ua qhov no los ntawm kev ua haujlwm cron:

#!/bin/bash

echo "30 6 * * *    /usr/local/bin/wal-g delete before FIND_FULL $(date -d '-10 days' '+%FT%TZ') --confirm >> /var/log/postgresql/walg_delete.log 2>&1" >> /var/spool/cron/crontabs/postgres
# ещё раз задаем владельца и выставляем правильные права файлу (хоть это обычно это и не нужно повторно делать)
chown postgres: /var/spool/cron/crontabs/postgres
chmod 600 /var/spool/cron/crontabs/postgres

Cron yuav khiav txoj haujlwm no txhua hnub thaum 6:30 teev sawv ntxov, tshem tawm txhua yam (tag nrho cov thaub qab, deltas thiab WALs) tshwj tsis yog cov ntawv luam rau 10 hnub kawg, tab sis tawm tsawg kawg ib qho thaub qab rau tau teev hnub kom muaj qhov taw tes tom qab cov hnub tau suav nrog PITR.

Rov qab los ntawm thaub qab

Nws tsis pub leejtwg paub tias tus yuam sij rau cov ntaub ntawv noj qab haus huv yog kev kho dua tshiab thiab kev txheeb xyuas qhov tseeb ntawm cov ntaub ntawv sab hauv. Kuv yuav qhia koj yuav ua li cas rov qab siv WAL-G hauv ntu no, thiab peb yuav tham txog cov tshev tom qab.

Nws yog tsim nyog sau cia nyias uas yuav rov qab los rau hauv ib puag ncig kev sim (txhua yam uas tsis yog ntau lawm) koj yuav tsum siv tus lej nyeem nkaus xwb hauv S3 thiaj li tsis txhob yuam kev overwrite thaub qab. Nyob rau hauv rooj plaub ntawm WAL-G, koj yuav tsum tau teeb tsa cov cai hauv qab no rau tus neeg siv S3 hauv Pawg Txoj Cai (Cov nyhuv: Tso cai): s 3:twb, s3: lis, s3: GetBucketLocation. Thiab, ntawm chav kawm, tsis txhob hnov ​​qab teeb archive_mode = tawm hauv cov ntaub ntawv teeb tsa postgresql.conf, kom koj cov ntaub ntawv xeem tsis xav tau thaub qab ntsiag to.

Kev kho dua tshiab yog ua nrog me ntsis txav ntawm tes tshem tag nrho PostgreSQL cov ntaub ntawv (nrog rau cov neeg siv), yog li thov ceev faj heev thaum koj khiav cov lus txib hauv qab no.

#!/bin/bash

# если есть балансировщик подключений (например, pgbouncer), то вначале отключаем его, чтобы он не нарыгал ошибок в лог
service pgbouncer stop
# если есть демон, который перезапускает упавшие процессы (например, monit), то останавливаем в нём процесс мониторинга базы (у меня это pgsql12)
monit stop pgsql12
# или останавливаем мониторинг полностью
service monit stop
# останавливаем саму базу данных
service postgresql stop
# удаляем все данные из текущей базы (!!!); лучше предварительно сделать их копию, если есть свободное место на диске
rm -rf /var/lib/postgresql/12/main
# скачиваем резервную копию и разархивируем её
su - postgres -c '/usr/local/bin/wal-g backup-fetch /var/lib/postgresql/12/main LATEST'
# помещаем рядом с базой специальный файл-сигнал для восстановления (см. https://postgrespro.ru/docs/postgresql/12/runtime-config-wal#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY ), он обязательно должен быть создан от пользователя postgres
su - postgres -c 'touch /var/lib/postgresql/12/main/recovery.signal'
# запускаем базу данных, чтобы она инициировала процесс восстановления
service postgresql start

Rau cov neeg uas xav mus xyuas cov txheej txheem rov qab, ib qho me me ntawm bash khawv koob tau npaj hauv qab no, yog li ntawd, yog tias muaj teeb meem hauv kev rov qab los, tsab ntawv yuav tsoo nrog qhov tsis yog xoom tawm code. Hauv qhov piv txwv no, 120 daim tshev raug ua nrog lub sijhawm ntawm 5 vib nas this (tag nrho ntawm 10 feeb rau kev rov qab) kom paub seb cov ntaub ntawv teeb liab puas raug tshem tawm (qhov no yuav txhais tau tias qhov kev rov qab ua tiav):

#!/bin/bash

CHECK_RECOVERY_SIGNAL_ITER=0
while [ ${CHECK_RECOVERY_SIGNAL_ITER} -le 120 ]
do
    if [ ! -f "/var/lib/postgresql/12/main/recovery.signal" ]
    then
        echo "recovery.signal removed"
        break
    fi
    sleep 5
    ((CHECK_RECOVERY_SIGNAL_ITER+1))
done

# если после всех проверок файл всё равно существует, то падаем с ошибкой
if [ -f "/var/lib/postgresql/12/main/recovery.signal" ]
then
    echo "recovery.signal still exists!"
    exit 17
fi

Tom qab ua tiav rov qab zoo, tsis txhob hnov ​​​​qab pib tag nrho cov txheej txheem rov qab (pgbouncer/monit, thiab lwm yam).

Xyuas cov ntaub ntawv tom qab rov qab

Nws yog ib qho tseem ceeb los xyuas qhov kev ncaj ncees ntawm cov ntaub ntawv tom qab kev kho dua tshiab, kom tsis txhob muaj qhov xwm txheej nrog cov thaub qab tawg / crooked tsis tshwm sim. Thiab nws yog qhov zoo dua los ua qhov no nrog txhua qhov tsim cov ntaub ntawv, tab sis qhov twg thiab yuav ua li cas nyob ntawm koj lub tswv yim nkaus xwb (koj tuaj yeem tsa tus neeg rau zaub mov hauv ib teev lossis khiav daim tshev hauv CI). Tab sis qhov tsawg kawg nkaus, nws yog ib qho tsim nyog los xyuas cov ntaub ntawv thiab cov indexes hauv cov ntaub ntawv.

Txhawm rau txheeb xyuas cov ntaub ntawv, nws txaus los khiav nws los ntawm kev pov tseg, tab sis nws yog qhov zoo dua tias thaum tsim cov ntaub ntawv koj muaj checksums enabled (cov ntaub ntawv checksums):

#!/bin/bash

if ! su - postgres -c 'pg_dumpall > /dev/null'
then
    echo 'pg_dumpall failed'
    exit 125
fi

Txhawm rau txheeb xyuas qhov ntsuas - muaj amcheck module, cia peb coj cov lus nug sql rau nws los ntawm Kev xeem WAL-G thiab tsim me ntsis logic nyob ib ncig ntawm nws:

#!/bin/bash

# добавляем sql-запрос для проверки в файл во временной директории
cat > /tmp/amcheck.sql << EOF
CREATE EXTENSION IF NOT EXISTS amcheck;
SELECT bt_index_check(c.oid), c.relname, c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree'
AND c.relpersistence != 't'
AND i.indisready AND i.indisvalid;
EOF
chown postgres: /tmp/amcheck.sql

# добавляем скрипт для запуска проверок всех доступных баз в кластере
# (обратите внимание что переменные и запуск команд – экранированы)
cat > /tmp/run_amcheck.sh << EOF
for DBNAME in $(su - postgres -c 'psql -q -A -t -c "SELECT datname FROM pg_database WHERE datistemplate = false;" ')
do
    echo "Database: ${DBNAME}"
    su - postgres -c "psql -f /tmp/amcheck.sql -v 'ON_ERROR_STOP=1' ${DBNAME}" && EXIT_STATUS=$? || EXIT_STATUS=$?
    if [ "${EXIT_STATUS}" -ne 0 ]
    then
        echo "amcheck failed on DB: ${DBNAME}"
        exit 125
    fi
done
EOF
chmod +x /tmp/run_amcheck.sh

# запускаем скрипт
/tmp/run_amcheck.sh > /tmp/amcheck.log

# для проверки что всё прошло успешно можно проверить exit code или grep’нуть ошибку
if grep 'amcheck failed' "/tmp/amcheck.log"
then
    echo 'amcheck failed: '
    cat /tmp/amcheck.log
    exit 125
fi

Sau npe

Kuv xav ua tsaug rau Andrey Borodin rau nws txoj kev pab hauv kev npaj kev tshaj tawm thiab ua tsaug tshwj xeeb rau nws txoj kev koom tes rau kev txhim kho WAL-G!

Qhov no xaus daim ntawv no. Kuv vam tias kuv tuaj yeem qhia qhov yooj yim ntawm kev teeb tsa thiab lub peev xwm loj rau kev siv cov cuab yeej no hauv koj lub tuam txhab. Kuv hnov ​​​​ntau txog WAL-G, tab sis tsis muaj sijhawm txaus los zaum thiab txheeb xyuas nws. Thiab tom qab kuv tau siv nws hauv tsev, tsab xov xwm no tawm ntawm kuv.

Cais, nws tsim nyog sau cia tias WAL-G tuaj yeem ua haujlwm nrog DBMS hauv qab no:

Tau qhov twg los: www.hab.com

Ntxiv ib saib