Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Nyob rau hauv tsab xov xwm kuv yuav qhia rau koj seb peb mus txog qhov teeb meem ntawm PostgreSQL txhaum kev zam txim, vim li cas nws thiaj li tseem ceeb rau peb thiab dab tsi tshwm sim thaum kawg.

Peb muaj cov kev pabcuam thauj khoom hnyav: 2,5 lab cov neeg siv thoob ntiaj teb, 50K + cov neeg siv nquag txhua hnub. Cov servers nyob hauv Amazone hauv ib cheeb tsam ntawm Ireland: 100+ cov servers sib txawv tau ua haujlwm tas li, uas yuav luag 50 nrog cov ntaub ntawv.

Tag nrho cov backend yog ib qho loj monolithic stateful Java daim ntawv thov uas ua kom lub websocket txuas nrog cov neeg siv khoom. Thaum ntau tus neeg siv ua haujlwm ntawm tib lub rooj tsavxwm tib lub sijhawm, lawv txhua tus pom cov kev hloov pauv hauv lub sijhawm tiag tiag, vim tias peb sau txhua qhov kev hloov pauv rau hauv cov ntaub ntawv. Peb muaj txog 10K thov ib ob rau peb cov ntaub ntawv. Thaum lub ncov load hauv Redis, peb sau 80-100K thov ib ob.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Vim li cas peb hloov ntawm Redis mus rau PostgreSQL

Thaum pib, peb qhov kev pabcuam tau ua haujlwm nrog Redis, lub khw muag khoom tseem ceeb uas khaws tag nrho cov ntaub ntawv hauv lub server lub RAM.

Qhov zoo ntawm Redis:

  1. Siab teb ceev, vim txhua yam yog khaws cia rau hauv lub cim xeeb;
  2. Yooj yim ntawm thaub qab thiab rov ua dua.

Cons ntawm Redis rau peb:

  1. Tsis muaj kev hloov pauv tiag tiag. Peb sim simulate lawv nyob rau theem ntawm peb daim ntawv thov. Hmoov tsis zoo, qhov no tsis tas yuav ua haujlwm zoo thiab yuav tsum tau sau cov lej nyuaj heev.
  2. Tus nqi ntawm cov ntaub ntawv yog txwv los ntawm tus nqi ntawm lub cim xeeb. Raws li tus nqi ntawm cov ntaub ntawv nce, lub cim xeeb yuav loj tuaj, thiab, thaum kawg, peb yuav khiav mus rau cov yam ntxwv ntawm cov piv txwv uas tau xaiv, uas nyob rau hauv AWS yuav tsum nres peb cov kev pab cuam los hloov hom piv txwv.
  3. Nws yog ib qho tsim nyog yuav tsum ua kom muaj qib qis latency, vim tias. peb muaj ntau qhov kev thov. Qhov zoo tshaj plaws ncua sij hawm rau peb yog 17-20 ms. Nyob rau theem ntawm 30-40 ms, peb tau txais cov lus teb ntev rau kev thov los ntawm peb daim ntawv thov thiab degradation ntawm cov kev pabcuam. Hmoov tsis zoo, qhov no tau tshwm sim rau peb lub Cuaj Hlis 2018, thaum ib qho piv txwv nrog Redis rau qee qhov laj thawj tau txais latency 2 zaug ntau dua li niaj zaus. Txhawm rau daws qhov teeb meem, peb nres qhov kev pabcuam nruab nrab hnub rau kev saib xyuas tsis tau teem sijhawm thiab hloov qhov teeb meem Redis piv txwv.
  4. Nws yog ib qho yooj yim kom tau txais cov ntaub ntawv tsis sib xws txawm tias muaj qhov yuam kev me me hauv cov lej thiab tom qab ntawd siv sijhawm ntau los sau cov lej los kho cov ntaub ntawv no.

Peb coj mus rau hauv tus account lub cons thiab pom tau hais tias peb yuav tsum tau txav mus rau ib yam dab tsi yooj yim dua, nrog ib txwm muas thiab tsawg dependence rau latency. Ua kev tshawb fawb, tshuaj xyuas ntau yam kev xaiv thiab xaiv PostgreSQL.

Peb tau tsiv mus rau cov ntaub ntawv tshiab rau 1,5 xyoo dhau los thiab tau tsiv ib feem me me ntawm cov ntaub ntawv, yog li tam sim no peb tab tom ua haujlwm ib txhij nrog Redis thiab PostgreSQL. Cov ntaub ntawv ntau ntxiv txog cov theem ntawm kev txav thiab hloov cov ntaub ntawv ntawm cov ntaub ntawv tau sau rau hauv kuv cov npoj yaig tsab xov xwm.

Thaum peb xub pib tsiv, peb daim ntawv thov ua haujlwm ncaj qha nrog cov ntaub ntawv thiab nkag mus rau tus tswv Redis thiab PostgreSQL. Pawg PostgreSQL muaj tus tswv thiab ib qho kev hloov pauv nrog asynchronous replication. Qhov no yog li cas cov txheej txheem database zoo li:
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Siv PgBouncer

Thaum peb tsiv mus, cov khoom tseem tab tom tsim: tus naj npawb ntawm cov neeg siv thiab cov servers uas ua haujlwm nrog PostgreSQL nce, thiab peb pib tsis muaj kev sib txuas. PostgreSQL tsim cov txheej txheem sib cais rau txhua qhov kev sib txuas thiab siv cov peev txheej. Koj tuaj yeem nce tus naj npawb ntawm kev sib txuas mus rau ib qho chaw, txwv tsis pub nws muaj lub sijhawm kom tau txais cov ntaub ntawv zoo tshaj plaws. Qhov kev xaiv zoo tshaj plaws hauv qhov xwm txheej zoo li no yuav yog xaiv tus thawj tswj kev sib txuas uas yuav sawv ntsug ntawm lub hauv paus.

Peb muaj ob txoj kev xaiv rau tus thawj tswj kev sib txuas: Pgpool thiab PgBouncer. Tab sis thawj tus tsis txhawb kev hloov pauv ntawm kev ua haujlwm nrog cov ntaub ntawv, yog li peb xaiv PgBouncer.

Peb tau teeb tsa cov tswv yim hauv qab no ntawm kev ua haujlwm: peb daim ntawv thov nkag mus rau ib qho PgBouncer, tom qab uas yog PostgreSQL masters, thiab tom qab txhua tus tswv yog ib qho piv txwv nrog asynchronous replication.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Tib lub sijhawm, peb tsis tuaj yeem khaws tag nrho cov ntaub ntawv hauv PostgreSQL thiab kev ceev ntawm kev ua haujlwm nrog cov ntaub ntawv tseem ceeb rau peb, yog li peb pib sharding PostgreSQL ntawm daim ntawv thov. Cov txheej txheem tau piav qhia saum toj no yog qhov yooj yim rau qhov no: thaum ntxiv PostgreSQL shard tshiab, nws txaus los hloov kho PgBouncer teeb tsa thiab daim ntawv thov tuaj yeem ua haujlwm tam sim nrog tus tshiab shard.

PgBouncer tsis ua haujlwm

Cov tswv yim no ua haujlwm txog thaum lub sijhawm PgBouncer nkaus xwb tuag. Peb nyob hauv AWS, qhov twg txhua qhov xwm txheej tau khiav ntawm cov khoom siv uas tuag ib ntus. Hauv cov xwm txheej zoo li no, qhov piv txwv tsuas yog txav mus rau kho vajtse tshiab thiab ua haujlwm dua. Qhov no tshwm sim nrog PgBouncer, tab sis nws tau dhau los ua tsis muaj. Qhov tshwm sim ntawm lub caij nplooj zeeg no yog qhov tsis muaj ntawm peb cov kev pabcuam rau 25 feeb. AWS pom zoo kom siv cov neeg siv-sab redundancy rau cov xwm txheej zoo li no, uas tsis tau siv hauv peb lub tebchaws thaum lub sijhawm ntawd.

Tom qab ntawd, peb xav txog qhov ua txhaum cai ntawm PgBouncer thiab PostgreSQL pawg, vim tias qhov xwm txheej zoo sib xws tuaj yeem tshwm sim nrog ib qho piv txwv hauv peb tus account AWS.

Peb tau tsim lub PgBouncer kev ua txhaum txoj cai raws li hauv qab no: txhua daim ntawv thov servers nkag mus rau Network Load Balancer, tom qab uas muaj ob PgBouncers. Txhua PgBouncer saib tib tus PostgreSQL tus tswv ntawm txhua tus shard. Yog tias qhov xwm txheej AWS tshwm sim dua, tag nrho cov tsheb khiav mus los ntawm lwm tus PgBouncer. Network Load Balancer failover yog muab los ntawm AWS.

Cov tswv yim no ua rau nws yooj yim ntxiv PgBouncer servers tshiab.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Tsim PostgreSQL Failover Cluster

Thaum daws qhov teeb meem no, peb xav txog cov kev xaiv sib txawv: tus kheej sau tsis ua haujlwm, repmgr, AWS RDS, Patroni.

Cov ntawv sau tus kheej

Lawv tuaj yeem saib xyuas kev ua haujlwm ntawm tus tswv thiab, yog tias nws ua tsis tiav, txhawb nqa qhov hloov pauv rau tus tswv thiab hloov kho PgBouncer teeb tsa.

Qhov zoo ntawm txoj hauv kev no yog qhov yooj yim tshaj plaws, vim tias koj sau ntawv koj tus kheej thiab nkag siab raws nraim li cas lawv ua haujlwm.

Txais:

  • Tus tswv yuav tsis tuag, hloov lub network tsis ua haujlwm yuav tshwm sim. Cov neeg tsis ntseeg, tsis paub txog qhov no, yuav txhawb nqa qhov hloov pauv rau tus tswv, thaum tus tswv qub tseem yuav ua haujlwm ntxiv. Yog li ntawd, peb yuav tau txais ob lub servers hauv lub luag haujlwm ntawm tus tswv thiab peb yuav tsis paub tias leej twg ntawm lawv muaj cov ntaub ntawv tshiab tshaj plaws. Qhov xwm txheej no tseem hu ua split-brain;
  • Peb raug tso tseg yam tsis muaj lus teb. Nyob rau hauv peb configuration, tus tswv thiab ib tug replica, tom qab hloov, lub replica txav mus rau tus tswv thiab peb tsis muaj replicas, yog li peb yuav tsum manually ntxiv ib tug tshiab replica;
  • Peb xav tau kev saib xyuas ntxiv ntawm kev ua haujlwm tsis ua haujlwm, thaum peb muaj 12 PostgreSQL shards, uas txhais tau tias peb yuav tsum saib xyuas 12 pawg. Nrog rau qhov nce ntawm tus naj npawb ntawm shards, koj yuav tsum nco ntsoov hloov kho qhov tsis ua tiav.

Kev sau tus kheej tsis ua haujlwm zoo li nyuaj heev thiab xav tau kev txhawb nqa tsis tseem ceeb. Nrog ib qho PostgreSQL pawg, qhov no yuav yog qhov kev xaiv yooj yim tshaj plaws, tab sis nws tsis ntsuas, yog li nws tsis haum rau peb.

Repmgr

Replication Manager rau PostgreSQL pawg, uas tuaj yeem tswj hwm kev ua haujlwm ntawm PostgreSQL pawg. Nyob rau tib lub sijhawm, nws tsis muaj qhov tsis siv neeg tsis siv neeg tawm ntawm lub thawv, yog li rau kev ua haujlwm koj yuav tsum tau sau koj tus kheej "wrapper" rau saum cov tshuaj tiav. Yog li txhua yam tuaj yeem dhau los ua qhov nyuaj dua nrog cov ntawv sau tus kheej, yog li peb tsis txawm sim Repmgr.

AWS RDS

Txhawb txhua yam peb xav tau, paub yuav ua li cas thaub qab thiab tswj lub pas dej ntawm kev sib txuas. Nws muaj kev hloov pauv tsis siv neeg: thaum tus tswv tuag, tus qauv hloov pauv dhau los ua tus tswv tshiab, thiab AWS hloov pauv cov ntaub ntawv dns rau tus tswv tshiab, thaum tus qauv hloov pauv tuaj yeem nyob hauv AZs sib txawv.

Qhov tsis zoo muaj xws li tsis muaj kev kho kom zoo. Raws li qhov piv txwv ntawm kev kho kom zoo: peb qhov xwm txheej muaj kev txwv rau kev sib txuas tcp, uas, hmoov tsis, tsis tuaj yeem ua hauv RDS:

net.ipv4.tcp_keepalive_time=10
net.ipv4.tcp_keepalive_intvl=1
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_retries2=3

Tsis tas li ntawd, AWS RDS yog yuav luag ob npaug li tus nqi piv txwv li niaj zaus, uas yog qhov laj thawj tseem ceeb rau kev tso tseg qhov kev daws teeb meem no.

Patroni

Nov yog tus qauv python rau kev tswj hwm PostgreSQL nrog cov ntaub ntawv zoo, tsis siv neeg tsis siv neeg thiab qhov chaws ntawm github.

Pros ntawm Patroni:

  • Txhua qhov kev teeb tsa parameter tau piav qhia, nws paub meej tias nws ua haujlwm li cas;
  • Tsis siv neeg tsis siv neeg ua haujlwm tawm ntawm lub thawv;
  • Sau rau hauv python, thiab txij li thaum peb tus kheej sau ntau hauv python, nws yuav yooj yim dua rau peb daws cov teeb meem thiab, tej zaum, txawm pab txhim kho qhov project;
  • Tag nrho tswj PostgreSQL, tso cai rau koj hloov qhov kev teeb tsa ntawm tag nrho cov nodes ntawm pawg ib zaug, thiab yog tias pawg yuav tsum tau rov pib dua los siv cov kev teeb tsa tshiab, ces qhov no tuaj yeem ua tiav dua siv Patroni.

Txais:

  • Nws tsis paub meej los ntawm cov ntaub ntawv yuav ua li cas ua haujlwm nrog PgBouncer kom raug. Txawm hais tias nws nyuaj rau hu nws tus lej rho tawm, vim tias txoj haujlwm ntawm Patroni yog tswj hwm PostgreSQL, thiab kev sib txuas rau Patroni yuav mus li cas yog peb qhov teeb meem lawm;
  • Muaj qee qhov piv txwv ntawm kev siv Patroni ntawm cov ntim loj, thaum muaj ntau yam piv txwv ntawm kev siv los ntawm kos.

Raws li qhov tshwm sim, peb xaiv Patroni los tsim ib pawg ua tsis tiav.

Patroni Implementation Process

Ua ntej Patroni, peb muaj 12 PostgreSQL shards nyob rau hauv ib tug configuration ntawm ib tug tswv thiab ib tug replica nrog asynchronous replication. Cov ntawv thov servers nkag mus rau cov ntaub ntawv los ntawm Network Load Balancer, tom qab uas yog ob qho piv txwv nrog PgBouncer, thiab tom qab lawv yog tag nrho PostgreSQL servers.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Txhawm rau siv Patroni, peb yuav tsum xaiv qhov kev faib khoom faib pawg. Patroni ua haujlwm nrog cov txheej txheem faib khoom xws li lwm yam, Zookeeper, Consul. Peb tsuas yog muaj pawg Consul tag nrho ntawm kev ua lag luam, uas ua haujlwm nrog Vault thiab peb tsis siv nws ntxiv lawm. Ib qho laj thawj zoo los pib siv Consul rau nws lub hom phiaj.

Patroni ua haujlwm li cas nrog Consul

Peb muaj ib pawg Consul, uas muaj peb pawg, thiab pawg Patroni, uas muaj cov thawj coj thiab cov qauv qub (hauv Patroni, tus tswv hu ua pawg thawj coj, thiab cov qhev hu ua replicas). Txhua qhov piv txwv ntawm Patroni pawg pheej xa cov ntaub ntawv hais txog lub xeev ntawm pawg mus rau Consul. Yog li ntawd, los ntawm Consul koj tuaj yeem tshawb pom qhov kev teeb tsa tam sim no ntawm Patroni pawg thiab leej twg yog tus thawj coj tam sim no.

Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Txhawm rau txuas Patroni mus rau Consul, nws txaus los kawm cov ntaub ntawv raug cai, uas hais tias koj yuav tsum qhia tus tswv tsev hauv http lossis https, nyob ntawm seb peb ua haujlwm li cas nrog Consul, thiab cov txheej txheem kev sib txuas, xaiv tau:

host: the host:port for the Consul endpoint, in format: http(s)://host:port
scheme: (optional) http or https, defaults to http

Nws zoo nkaus li yooj yim, tab sis ntawm no cov pitfalls pib. Nrog Consul, peb ua haujlwm dhau qhov kev sib txuas ruaj ntseg ntawm https thiab peb qhov kev sib txuas config yuav zoo li no:

consul:
  host: https://server.production.consul:8080 
  verify: true
  cacert: {{ consul_cacert }}
  cert: {{ consul_cert }}
  key: {{ consul_key }}

Tab sis qhov ntawd tsis ua haujlwm. Thaum pib, Patroni tsis tuaj yeem txuas mus rau Consul, vim nws sim mus dhau http lawm.

Lub hauv paus code ntawm Patroni tau pab daws qhov teeb meem. Qhov zoo nws tau sau rau hauv python. Nws hloov tawm hais tias tus tswv tsev parameter tsis parsed nyob rau hauv txhua txoj kev, thiab raws tu qauv yuav tsum tau teev nyob rau hauv lub tswvyim. Nov yog yuav ua li cas kev teeb tsa kev ua haujlwm rau kev ua haujlwm nrog Consul zoo li rau peb:

consul:
  host: server.production.consul:8080
  scheme: https
  verify: true
  cacert: {{ consul_cacert }}
  cert: {{ consul_cert }}
  key: {{ consul_key }}

consul-tus qauv

Yog li, peb tau xaiv qhov chaw cia rau kev teeb tsa. Tam sim no peb yuav tsum nkag siab tias PgBouncer yuav hloov nws qhov kev teeb tsa li cas thaum hloov tus thawj coj hauv pawg Patroni. Tsis muaj lus teb rau lo lus nug no hauv cov ntaub ntawv, vim. muaj, hauv txoj cai, ua haujlwm nrog PgBouncer tsis tau piav qhia.

Hauv kev tshawb nrhiav kev daws teeb meem, peb pom ib tsab xov xwm (Kuv hmoov tsis nco qab lub npe) qhov twg nws tau sau tias Π‘onsul-template tau pab ntau heev hauv kev ua khub PgBouncer thiab Patroni. Qhov no ua rau peb tshawb xyuas seb Consul-template ua haujlwm li cas.

Nws muab tawm tias Consul-template tas li saib xyuas cov teeb tsa ntawm PostgreSQL pawg hauv Consul. Thaum tus thawj coj hloov pauv, nws hloov kho PgBouncer teeb tsa thiab xa cov lus txib kom rov thauj nws.

Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Ib qho ntxiv ntawm cov qauv yog tias nws tau khaws cia raws li cov cai, yog li thaum ntxiv ib qho tshiab shard, nws txaus los ua ib qho kev cog lus tshiab thiab hloov kho cov qauv txiav, txhawb cov Infrastructure raws li txoj cai.

Tshiab architecture nrog Patroni

Raws li qhov tshwm sim, peb tau txais cov txheej txheem ua haujlwm hauv qab no:
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Txhua daim ntawv thov servers nkag mus rau qhov sib npaug β†’ muaj ob qhov piv txwv ntawm PgBouncer qab nws β†’ ntawm txhua qhov piv txwv, Consul-template tau pib, uas saib xyuas cov xwm txheej ntawm txhua pawg Patroni thiab saib xyuas qhov cuam tshuam ntawm PgBouncer config, uas xa cov lus thov mus rau tus thawj coj tam sim no ntawm txhua pawg.

Kev ntsuas phau ntawv

Peb tau khiav cov phiaj xwm no ua ntej tso nws ntawm qhov chaw sim me me thiab kuaj xyuas qhov kev ua haujlwm ntawm kev hloov pauv tsis siv neeg. Lawv qhib lub rooj tsavxwm, txav daim ntawv nplaum, thiab lub sijhawm ntawd lawv "tua" tus thawj coj ntawm pawg. Hauv AWS, qhov no yooj yim li kaw qhov piv txwv ntawm lub console.

Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Daim ntawv nplaum rov qab rov qab li ntawm 10-20 vib nas this, thiab tom qab ntawd rov pib txav ib txwm. Qhov no txhais tau hais tias pawg Patroni ua haujlwm raug: nws hloov tus thawj coj, xa cov ntaub ntawv mus rau Π‘onsul, thiab Π‘onsul-template tau khaws cov ntaub ntawv tam sim ntawd, hloov PgBouncer kev teeb tsa thiab xa cov lus txib rov qab.

Yuav ua li cas kom ciaj sia nyob rau hauv siab load thiab kom lub downtime tsawg?

Txhua yam ua haujlwm zoo kawg nkaus! Tab sis muaj cov lus nug tshiab: Yuav ua li cas nws ua hauj lwm nyob rau hauv siab load? Yuav ua li cas kom sai thiab nyab xeeb yob tawm txhua yam hauv kev tsim khoom?

Kev xeem ib puag ncig uas peb ua cov kev xeem thauj khoom pab peb teb thawj lo lus nug. Nws yog kiag li zoo tib yam rau ntau lawm nyob rau hauv cov nqe lus ntawm architecture thiab tau generated xeem cov ntaub ntawv uas yog kwv yees li sib npaug ntawm ntim rau ntau lawm. Peb txiav txim siab tsuas yog "tua" ib qho ntawm PostgreSQL masters thaum lub sijhawm sim thiab pom tias muaj dab tsi tshwm sim. Tab sis ua ntej ntawd, nws yog ib qho tseem ceeb los xyuas qhov tsis siv neeg dov, vim hais tias ntawm qhov chaw no peb muaj ob peb PostgreSQL shards, yog li peb yuav tau txais kev ntsuam xyuas zoo ntawm cov ntawv teeb tsa ua ntej ntau lawm.

Ob txoj haujlwm zoo li xav tau, tab sis peb muaj PostgreSQL 9.6. Peb puas tuaj yeem hloov kho tam sim rau 11.2?

Peb txiav txim siab ua nws nyob rau hauv 2 kauj ruam: thawj upgrade rau 11.2, ces tso Patroni.

PostgreSQL hloov tshiab

Txhawm rau hloov kho PostgreSQL sai sai, siv qhov kev xaiv -k, nyob rau hauv uas nyuaj txuas yog tsim rau disk thiab tsis tas yuav luam koj cov ntaub ntawv. Ntawm lub hauv paus ntawm 300-400 GB, qhov hloov tshiab yuav siv sijhawm 1 thib ob.

Peb muaj ntau shards, yog li qhov hloov tshiab yuav tsum tau ua tiav. Txhawm rau ua qhov no, peb tau sau Ansible playbook uas ua haujlwm tag nrho cov txheej txheem hloov tshiab rau peb:

/usr/lib/postgresql/11/bin/pg_upgrade 
<b>--link </b>
--old-datadir='' --new-datadir='' 
 --old-bindir=''  --new-bindir='' 
 --old-options=' -c config_file=' 
 --new-options=' -c config_file='

Nws yog ib qho tseem ceeb uas yuav tsum nco ntsoov ntawm no tias ua ntej pib qhov kev hloov kho tshiab, koj yuav tsum ua nws nrog qhov ntsuas -- xyuaskom paub tseeb tias koj tuaj yeem hloov kho. Peb tsab ntawv kuj ua rau kev hloov pauv ntawm configs rau lub sijhawm hloov kho. Peb tsab ntawv ua tiav hauv 30 vib nas this, uas yog qhov txiaj ntsig zoo heev.

Tua tawm Patroni

Txhawm rau daws qhov teeb meem thib ob, tsuas yog saib Patroni teeb tsa. Cov chaw khaws ntaub ntawv raug cai muaj qhov piv txwv teeb tsa nrog initdb, uas yog lub luag haujlwm rau kev pib cov ntaub ntawv tshiab thaum koj thawj zaug pib Patroni. Tab sis txij li thaum peb twb muaj cov ntaub ntawv npaj txhij, peb tsuas yog tshem cov ntu no los ntawm kev teeb tsa.

Thaum peb pib txhim kho Patroni ntawm ib qho uas twb muaj lawm PostgreSQL pawg thiab khiav nws, peb tau khiav mus rau qhov teeb meem tshiab: ob lub servers pib ua tus thawj coj. Patroni tsis paub dab tsi txog lub xeev thaum ntxov ntawm pawg thiab sim pib ob lub servers ua ob pawg sib cais nrog tib lub npe. Txhawm rau daws qhov teeb meem no, koj yuav tsum rho tawm cov npe nrog cov ntaub ntawv ntawm tus qhev:

rm -rf /var/lib/postgresql/

Qhov no yuav tsum tau ua rau tus qhev xwb!

Thaum ib tug huv replica yog txuas, Patroni ua ib tug basebackup thawj coj thiab rov qab mus rau lub replica, thiab ces catches nrog rau tam sim no lub xeev raws li lub wal cav.

Lwm qhov teeb meem uas peb ntsib yog tias tag nrho PostgreSQL pawg yog lub npe tseem ceeb los ntawm lub neej ntawd. Thaum txhua pawg tsis paub dab tsi txog lwm tus, qhov no yog qhov qub. Tab sis thaum koj xav siv Patroni, ces txhua pawg yuav tsum muaj lub npe tshwj xeeb. Txoj kev daws teeb meem yog hloov lub npe pawg hauv PostgreSQL teeb tsa.

load xeem

Peb tau tshaj tawm qhov kev sim uas simulates cov neeg siv kev paub ntawm cov laug cam. Thaum lub load mus txog peb qhov nruab nrab txhua hnub tus nqi, peb rov ua tib yam kev sim, peb muab ib qho piv txwv nrog PostgreSQL tus thawj coj. Qhov tsis siv neeg tsis siv neeg ua haujlwm raws li peb xav tau: Patroni hloov tus thawj coj, Consul-template hloov kho PgBouncer teeb tsa thiab xa cov lus txib kom rov ua haujlwm dua. Raws li peb cov duab hauv Grafana, nws tau pom tseeb tias muaj kev ncua ntawm 20-30 vib nas this thiab qhov yuam kev me me los ntawm cov servers cuam tshuam nrog kev sib txuas rau cov ntaub ntawv. Qhov no yog qhov xwm txheej ib txwm muaj, cov txiaj ntsig zoo li no tau txais rau peb qhov kev poob qis thiab yog qhov zoo tshaj qhov kev pabcuam downtime.

Nqa Patroni rau kev tsim khoom

Raws li qhov tshwm sim, peb tau los nrog cov phiaj xwm hauv qab no:

  • Deploy Consul-template rau PgBouncer servers thiab tso tawm;
  • PostgreSQL hloov tshiab rau version 11.2;
  • Hloov lub npe ntawm pawg;
  • Pib lub Patroni Cluster.

Nyob rau tib lub sijhawm, peb lub tswv yim tso cai rau peb ua thawj lub ntsiab lus yuav luag txhua lub sijhawm, peb tuaj yeem tshem tawm txhua tus PgBouncer los ntawm kev ua haujlwm dhau los thiab xa tawm thiab khiav consul-template ntawm nws. Yog li peb tau ua.

Rau kev xa tawm sai, peb siv Ansible, txij li peb twb tau sim tag nrho cov ntawv ua si ntawm qhov chaw sim, thiab lub sijhawm ua tiav ntawm daim ntawv tag nrho yog los ntawm 1,5 mus rau 2 feeb rau txhua shard. Peb tuaj yeem yob tawm txhua yam nyob rau hauv lem mus rau txhua shard yam tsis txwv peb cov kev pab cuam, tab sis peb yuav tsum tau tua txhua PostgreSQL rau ob peb feeb. Nyob rau hauv cov ntaub ntawv no, cov neeg siv uas nws cov ntaub ntawv nyob rau hauv no shard yuav tsis ua hauj lwm tag nrho nyob rau lub sij hawm no, thiab qhov no yog tsis tsim nyog rau peb.

Txoj hauv kev tawm ntawm qhov xwm txheej no yog kev npaj kho, uas yuav tshwm sim txhua 3 lub hlis. Qhov no yog lub qhov rais rau kev teem sijhawm ua haujlwm, thaum peb kaw tag nrho peb cov kev pabcuam thiab txhim kho peb cov ntaub ntawv database. Muaj ib lub lis piam mus txog rau lub qhov rais tom ntej, thiab peb txiav txim siab tos thiab npaj ntxiv. Thaum lub sij hawm tos, peb ntxiv kev ruaj ntseg rau peb tus kheej: rau txhua PostgreSQL shard, peb tsa ib tug seem seem nyob rau hauv cov ntaub ntawv ntawm tsis ua hauj lwm los khaws cov ntaub ntawv tshiab, thiab ntxiv ib tug tshiab piv txwv rau txhua shard, uas yuav tsum tau los ua ib tug tshiab replica nyob rau hauv lub Patroni pawg, thiaj li tsis mus execute ib tug hais kom rho tawm cov ntaub ntawv . Tag nrho cov no tau pab txo qis qhov kev pheej hmoo ntawm kev ua yuam kev.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Peb rov pib peb qhov kev pabcuam, txhua yam ua haujlwm raws li qhov yuav tsum tau ua, cov neeg siv txuas ntxiv mus ua haujlwm, tab sis ntawm cov duab peb pom qhov txawv txav siab ntawm Consul servers.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Vim li cas peb tsis pom qhov no hauv qhov chaw sim? Qhov teeb meem no qhia tau zoo heev tias nws yog ib qho tsim nyog yuav tsum ua raws li Infrastructure raws li txoj cai lij choj thiab kho kom zoo tag nrho cov kev tsim kho vaj tse, los ntawm qhov chaw sim mus rau kev tsim khoom. Txwv tsis pub, nws yooj yim heev kom tau txais qhov teeb meem peb tau txais. Dab tsi tshwm sim? Consul thawj zaug tshwm sim ntawm kev tsim khoom, thiab tom qab ntawd ntawm qhov chaw sim, vim li ntawd, ntawm qhov chaw sim, cov qauv ntawm Consul tau siab dua ntawm kev tsim khoom. Tsuas yog hauv ib qho ntawm kev tshaj tawm, CPU xau tau daws thaum ua haujlwm nrog tus kws saib xyuas-tus qauv. Yog li ntawd, peb tsuas hloov kho Consul, yog li daws qhov teeb meem.

Restart Patroni pawg

Txawm li cas los xij, peb tau txais qhov teeb meem tshiab, uas peb tsis txawm xav tias. Thaum hloov kho Consul, peb tsuas yog tshem tawm Consul node los ntawm pawg siv tus consul tawm hais kom ua β†’ Patroni txuas mus rau lwm tus Consul server β†’ txhua yam ua haujlwm. Tab sis thaum peb mus txog qhov kawg piv txwv ntawm pawg Consul thiab xa tus consul tawm cov lus txib rau nws, tag nrho Patroni pawg tsuas yog rov pib dua, thiab hauv cov ntawv teev lus peb pom qhov yuam kev hauv qab no:

ERROR: get_cluster
Traceback (most recent call last):
...
RetryFailedError: 'Exceeded retry deadline'
ERROR: Error communicating with DCS
<b>LOG: database system is shut down</b>

Pawg Patroni tsis tuaj yeem khaws cov ntaub ntawv hais txog nws pawg thiab rov pib dua.

Txhawm rau nrhiav kev daws teeb meem, peb tau hu rau Patroni cov kws sau ntawv los ntawm qhov teeb meem ntawm github. Lawv qhia txog kev txhim kho rau peb cov ntaub ntawv teeb tsa:

consul:
 consul.checks: []
bootstrap:
 dcs:
   retry_timeout: 8

Peb muaj peev xwm rov ua dua qhov teeb meem ntawm qhov chaw sim thiab sim cov kev xaiv no, tab sis hmoov tsis lawv tsis ua haujlwm.

Qhov teeb meem tseem tsis tau daws. Peb npaj yuav sim cov kev daws teeb meem hauv qab no:

  • Siv Consul-tus neeg sawv cev ntawm txhua qhov piv txwv Patroni pawg;
  • Txhim kho qhov teeb meem hauv qhov chaws.

Peb nkag siab qhov twg qhov kev ua yuam kev tshwm sim: qhov teeb meem yog tej zaum yog kev siv lub sij hawm ncua sij hawm, uas tsis yog overridden los ntawm cov ntaub ntawv configuration. Thaum kawg Consul neeg rau zaub mov raug tshem tawm ntawm pawg, tag nrho Consul pawg hangs rau ntau tshaj ib ob, vim li no, Patroni tsis tuaj yeem tau txais cov xwm txheej ntawm pawg thiab rov pib tag nrho pawg.

Hmoov zoo, peb tsis tau ntsib qhov yuam kev ntxiv lawm.

Cov txiaj ntsig ntawm kev siv Patroni

Tom qab ua tiav qhov kev tso tawm ntawm Patroni, peb ntxiv ib qho kev hloov pauv ntxiv hauv txhua pawg. Tam sim no nyob rau hauv txhua pawg muaj xws li ib pawg ntawm pawg: ib tug thawj coj thiab ob replicas, rau kev ruaj ntseg net nyob rau hauv cov ntaub ntawv ntawm split-hlwb thaum hloov.
Failover Cluster PostgreSQL + Patroni. Kev ua tiav

Patroni tau ua haujlwm rau ntau tshaj peb lub hlis. Lub sijhawm no, nws twb tau tswj kom pab peb tawm. Tsis ntev los no, tus thawj coj ntawm ib pawg neeg tuag hauv AWS, tsis siv neeg ua haujlwm tsis zoo thiab cov neeg siv txuas ntxiv ua haujlwm. Patroni ua tiav nws txoj haujlwm tseem ceeb.

Cov ntsiab lus me me ntawm kev siv Patroni:

  • Yooj yim ntawm configuration hloov. Nws yog txaus los hloov qhov kev teeb tsa ntawm ib qho piv txwv thiab nws yuav raug rub mus rau tag nrho pawg. Yog hais tias ib tug reboot yuav tsum tau siv tus tshiab configuration, ces Patroni yuav qhia rau koj paub. Patroni tuaj yeem rov pib tag nrho pawg nrog ib qho kev hais kom ua, uas kuj yooj yim heev.
  • Tsis siv neeg swb ua haujlwm thiab twb tau tswj los pab peb tawm.
  • PostgreSQL hloov tshiab yam tsis muaj daim ntawv thov downtime. Koj yuav tsum xub hloov kho cov replicas rau lub tshiab version, ces hloov tus thawj coj nyob rau hauv lub Patroni pawg thiab hloov tshiab tus thawj coj qub. Nyob rau hauv cov ntaub ntawv no, qhov tsim nyog kuaj ntawm tsis siv neeg tsis ua hauj lwm tshwm sim.

Tau qhov twg los: www.hab.com

Ntxiv ib saib