Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

A cikin labarin zan gaya muku yadda muka fuskanci batun PostgreSQL kuskuren haƙuri, dalilin da ya sa ya zama mahimmanci a gare mu da abin da ya faru a ƙarshe.

Muna da sabis ɗin da aka ɗora sosai: masu amfani miliyan 2,5 a duk duniya, 50K+ masu amfani masu aiki kowace rana. Sabar suna cikin Amazone a cikin yanki ɗaya na Ireland: 100+ daban-daban sabobin suna aiki akai-akai, wanda kusan 50 suna tare da bayanan bayanai.

Gabaɗayan baya babban ƙa'idar Java ce ta monolithic wacce ke kiyaye haɗin yanar gizo koyaushe tare da abokin ciniki. Lokacin da masu amfani da yawa ke aiki a kan allo ɗaya a lokaci guda, duk suna ganin canje-canje a ainihin lokacin, saboda muna rubuta kowane canji zuwa bayanan bayanai. Muna da kusan buƙatun 10K a sakan daya zuwa rumbun bayanan mu. A babban nauyi a cikin Redis, muna rubuta buƙatun 80-100K a sakan daya.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Me yasa muka canza daga Redis zuwa PostgreSQL

Da farko, sabis ɗinmu ya yi aiki tare da Redis, babban kantin sayar da ƙima wanda ke adana duk bayanai a cikin RAM na uwar garken.

Ribobi na Redis:

  1. Babban saurin amsawa, saboda duk abin da aka adana a cikin ƙwaƙwalwar ajiya;
  2. Sauƙin wariyar ajiya da kwafi.

Fursunoni na Redis a gare mu:

  1. Babu ainihin ma'amaloli. Mun yi ƙoƙarin yin kwaikwayon su a matakin aikace-aikacen mu. Abin takaici, wannan ba koyaushe yana aiki da kyau ba kuma yana buƙatar rubuta lambobi masu rikitarwa.
  2. Adadin bayanai yana iyakance ta adadin ƙwaƙwalwar ajiya. Yayin da adadin bayanai ya karu, ƙwaƙwalwar ajiya za ta yi girma, kuma, a ƙarshe, za mu shiga cikin halaye na misalin da aka zaɓa, wanda a cikin AWS yana buƙatar dakatar da sabis na mu don canza nau'in misali.
  3. Wajibi ne a ci gaba da kula da ƙananan latency, saboda. muna da adadin buƙatun da yawa. Mafi kyawun matakin jinkiri a gare mu shine 17-20 ms. A matakin 30-40 ms, muna samun dogon martani ga buƙatun daga aikace-aikacen mu da lalata sabis. Abin takaici, wannan ya faru da mu a cikin Satumba 2018, lokacin da ɗaya daga cikin al'amuran tare da Redis saboda wasu dalilai ya sami latency sau 2 fiye da yadda aka saba. Don warware matsalar, mun dakatar da sabis a tsakiyar rana don kulawa mara tsari kuma mun maye gurbin matsalar Redis misali.
  4. Yana da sauƙi don samun rashin daidaituwar bayanai har ma da ƙananan kurakurai a cikin lambar sannan ku ciyar da lokaci mai yawa don rubuta lambar don gyara wannan bayanan.

Mun yi la'akari da fursunoni kuma mun gane cewa muna buƙatar matsawa zuwa wani abu mafi dacewa, tare da ma'amaloli na yau da kullum da ƙananan dogara ga latency. Binciken da aka gudanar, yayi nazarin zaɓuɓɓuka da yawa kuma ya zaɓi PostgreSQL.

Mun kasance muna matsawa zuwa sabon bayanan bayanai na shekaru 1,5 tuni kuma mun matsar da ƙaramin ɓangaren bayanan, don haka yanzu muna aiki tare tare da Redis da PostgreSQL. An rubuta ƙarin bayani game da matakan motsi da sauya bayanai tsakanin bayanan bayanai a ciki labarin abokin aikina.

Lokacin da muka fara motsi, aikace-aikacenmu yayi aiki kai tsaye tare da bayanan bayanai kuma mun sami damar babban Redis da PostgreSQL. Rukunin PostgreSQL ya ƙunshi babban jagora da kwafi tare da kwafi asynchronous. Ga yadda tsarin tsarin bayanai ya kasance kamar haka:
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Ana aiwatar da PgBouncer

Yayin da muke motsawa, samfurin kuma yana tasowa: yawan masu amfani da adadin sabar da ke aiki tare da PostgreSQL ya karu, kuma mun fara rashin haɗin gwiwa. PostgreSQL yana ƙirƙirar tsari daban don kowane haɗin gwiwa kuma yana cinye albarkatu. Kuna iya ƙara adadin haɗin kai har zuwa wani wuri, in ba haka ba akwai damar samun ingantaccen aikin bayanai. Zaɓin da ya dace a cikin irin wannan yanayin zai zama zaɓin mai sarrafa haɗin da zai tsaya a gaban tushe.

Muna da zaɓuɓɓuka guda biyu don mai sarrafa haɗin: Pgpool da PgBouncer. Amma na farko baya goyan bayan yanayin ma'amala na aiki tare da bayanan bayanai, don haka mun zaɓi PgBouncer.

Mun tsara tsarin aiki mai zuwa: aikace-aikacen mu yana samun damar PgBouncer guda ɗaya, wanda a baya akwai masters na PostgreSQL, kuma a bayan kowane maigidan akwai kwafi ɗaya tare da kwafi asynchronous.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

A lokaci guda, ba za mu iya adana duk adadin bayanai a cikin PostgreSQL ba kuma saurin aiki tare da bayanan yana da mahimmanci a gare mu, don haka mun fara sharding PostgreSQL a matakin aikace-aikacen. Makircin da aka bayyana a sama yana da ɗan dacewa don wannan: lokacin ƙara sabon shard PostgreSQL, ya isa ya sabunta tsarin PgBouncer kuma aikace-aikacen na iya aiki nan da nan tare da sabon shard.

PgBouncer gazawar

Wannan makirci ya yi aiki har zuwa lokacin da kawai misalin PgBouncer ya mutu. Muna cikin AWS, inda duk al'amura ke gudana akan kayan aikin da ke mutuwa lokaci-lokaci. A irin waɗannan lokuta, misalin yana motsawa kawai zuwa sabon kayan aiki kuma yana sake aiki. Wannan ya faru tare da PgBouncer, amma ya zama babu shi. Sakamakon wannan faɗuwar shine rashin samun sabis ɗinmu na mintuna 25. AWS yana ba da shawarar yin amfani da sake fasalin gefen mai amfani don irin waɗannan yanayi, waɗanda ba a aiwatar da su a ƙasarmu a wancan lokacin.

Bayan haka, mun yi tunani sosai game da haƙurin kuskure na ƙungiyoyin PgBouncer da PostgreSQL, saboda irin wannan yanayin na iya faruwa tare da kowane misali a cikin asusun mu na AWS.

Mun gina makircin haƙuri na kuskuren PgBouncer kamar haka: duk sabobin aikace-aikacen suna samun damar Ma'aunin Load na Network, wanda a baya akwai PgBouncers biyu. Kowane PgBouncer yana kallon maigidan PostgreSQL iri ɗaya na kowane shard. Idan wani hatsarin misali na AWS ya sake faruwa, duk zirga-zirgar ana karkatar da su ta wani PgBouncer. Ana samar da gazawar ma'aunin Load na hanyar sadarwa ta AWS.

Wannan makirci yana sauƙaƙa don ƙara sabbin sabar PgBouncer.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Ƙirƙiri Ƙirar Fasalar PostgreSQL

Lokacin magance wannan matsala, mun yi la'akari da zaɓuɓɓuka daban-daban: rashin nasarar rubutawa, repmgr, AWS RDS, Patroni.

Rubutun da aka rubuta da kansa

Za su iya saka idanu akan aikin maigidan kuma, idan ya gaza, haɓaka kwafi ga maigidan kuma sabunta tsarin PgBouncer.

Amfanin wannan hanya shine matsakaicin sauƙi, saboda kuna rubuta rubutun da kanku kuma ku fahimci ainihin yadda suke aiki.

Fursunoni:

  • Mai yiwuwa maigidan bai mutu ba, maimakon haka ta yiwu gazawar hanyar sadarwa ta faru. Rashin nasara, rashin sanin wannan, zai inganta kwafi ga maigidan, yayin da tsohon maigidan zai ci gaba da aiki. A sakamakon haka, za mu sami sabar guda biyu a matsayin master kuma ba za mu san ko wane daga cikinsu ke da sabbin bayanai na zamani ba. Wannan yanayin kuma ana kiransa tsaga-kwakwalwa;
  • An bar mu ba amsa. A cikin tsarin mu, maigidan da kwafi ɗaya, bayan an canza, kwafin yana motsawa har zuwa maigidan kuma ba mu da kwafi, don haka dole mu ƙara sabon kwafi da hannu;
  • Muna buƙatar ƙarin saka idanu kan aikin gazawar, yayin da muke da shards na PostgreSQL guda 12, wanda ke nufin dole ne mu saka idanu kan gungu 12. Tare da karuwa a cikin adadin shards, dole ne ku kuma tuna don sabunta rashin nasarar.

Fassara rubutun da kansa yayi kama da rikitarwa kuma yana buƙatar tallafi maras muhimmanci. Tare da gungu na PostgreSQL guda ɗaya, wannan zai zama zaɓi mafi sauƙi, amma baya girma, don haka bai dace da mu ba.

Repmr

Mai sarrafa Maimaitawa don gungu na PostgreSQL, wanda zai iya sarrafa aikin gungu na PostgreSQL. A lokaci guda, ba shi da gazawar atomatik daga cikin akwatin, don haka don aiki za ku buƙaci rubuta naku "wrapper" a saman ƙarshen bayani. Don haka komai na iya zama mafi rikitarwa fiye da rubutun da aka rubuta, don haka ba ma gwada Repmgr ba.

Farashin AWS RDS

Yana goyan bayan duk abin da muke buƙata, ya san yadda ake yin ajiya kuma yana kula da tafkin haɗin gwiwa. Yana da sauyawa ta atomatik: lokacin da maigidan ya mutu, kwafin ya zama sabon maigidan, kuma AWS yana canza rikodin dns zuwa sabon maigidan, yayin da kwafi zai iya kasancewa a cikin AZ daban-daban.

Abubuwan da ba su da amfani sun haɗa da rashin gyare-gyare mai kyau. A matsayin misali na daidaitawa mai kyau: al'amuran mu suna da hani don haɗin tcp, wanda, da rashin alheri, ba za a iya yin su a cikin RDS ba:

net.ipv4.tcp_keepalive_time=10
net.ipv4.tcp_keepalive_intvl=1
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_retries2=3

Bugu da kari, AWS RDS ya kusan sau biyu tsada kamar farashin misali na yau da kullun, wanda shine babban dalilin watsi da wannan mafita.

Majiɓinci

Wannan samfurin python ne don sarrafa PostgreSQL tare da kyawawan takardu, gazawar atomatik da lambar tushe akan github.

Ribobi na Patroni:

  • An kwatanta kowane ma'aunin daidaitawa, a bayyane yake yadda yake aiki;
  • Rashin nasara ta atomatik yana aiki daga cikin akwatin;
  • An rubuta shi a cikin Python, kuma tun da mu kanmu mun rubuta da yawa a cikin Python, zai zama da sauƙi a gare mu mu magance matsalolin kuma, watakila, har ma da taimakawa wajen bunkasa aikin;
  • Cikakken sarrafa PostgreSQL, yana ba ku damar canza saitin akan duk nodes na gungu a lokaci ɗaya, kuma idan rukunin yana buƙatar sake farawa don amfani da sabon saitin, to ana iya sake yin wannan ta amfani da Patroni.

Fursunoni:

  • Ba a bayyana ba daga takaddun yadda ake aiki tare da PgBouncer daidai. Ko da yake yana da wuya a kira shi a ragi, saboda aikin Patroni shine sarrafa PostgreSQL, da kuma yadda haɗin kai zuwa Patroni zai riga ya kasance matsalarmu;
  • Akwai 'yan misalai na aiwatar da Patroni akan manyan kundin, yayin da akwai misalai da yawa na aiwatarwa daga karce.

Sakamakon haka, mun zaɓi Patroni don ƙirƙirar gungu mai gazawa.

Tsarin Aiwatar da Abokin Ciniki

Kafin Patroni, muna da shards na PostgreSQL guda 12 a cikin tsarin maigida ɗaya da kwafi ɗaya tare da kwafi asynchronous. Sabar aikace-aikacen sun shiga cikin bayanan ta hanyar Network Load Balancer, a bayan su akwai lokuta biyu tare da PgBouncer, kuma a bayan su duk sabar PostgreSQL ne.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Don aiwatar da Patroni, muna buƙatar zaɓar daidaitawar gungu na ajiya da aka rarraba. Patroni yana aiki tare da tsarin ajiya na daidaitawa da aka rarraba kamar etcd, Zookeeper, Consul. Muna da cikakken gungu na Consul a kasuwa, wanda ke aiki tare da Vault kuma ba ma amfani da shi kuma. Babban dalilin fara amfani da Consul don manufar sa.

Yadda Patroni ke aiki tare da Consul

Muna da Consul cluster, wanda ya ƙunshi nodes uku, da Patroni cluster, wanda ya ƙunshi jagora da kwafi (a cikin Patroni, maigidan ana kiransa shugaban cluster, kuma bayi ana kiran su replicas). Kowane misali na gungun Patroni yana aika bayanai akai-akai game da yanayin tarin ga Consul. Saboda haka, daga Consul za ka iya ko da yaushe gano halin yanzu tsarin na Patroni cluster da kuma wanda shi ne shugaba a halin yanzu.

Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Don haɗa Patroni zuwa Consul, ya isa ya yi nazarin takaddun hukuma, wanda ya ce kuna buƙatar saka mai watsa shiri a cikin tsarin http ko https, dangane da yadda muke aiki tare da Consul, da tsarin haɗin gwiwa, zaɓi zaɓi:

host: the host:port for the Consul endpoint, in format: http(s)://host:port
scheme: (optional) http or https, defaults to http

Ya yi kama da sauki, amma a nan an fara farawa. Tare da Consul, muna aiki akan amintacciyar hanyar haɗin kai ta https kuma tsarin haɗin mu zai yi kama da wannan:

consul:
  host: https://server.production.consul:8080 
  verify: true
  cacert: {{ consul_cacert }}
  cert: {{ consul_cert }}
  key: {{ consul_key }}

Amma hakan baya aiki. A farawa, Patroni ba zai iya haɗawa da Consul ba, saboda yana ƙoƙarin shiga ta hanyar http ta wata hanya.

Lambar tushen Patroni ya taimaka wajen magance matsalar. Da kyau an rubuta shi a Python. Ya bayyana cewa ba a rarraba siga mai masauki ta kowace hanya, kuma dole ne a ƙayyade ƙa'idar a cikin makirci. Wannan shine yadda toshewar aiki don aiki tare da Consul yayi kama da mu:

consul:
  host: server.production.consul:8080
  scheme: https
  verify: true
  cacert: {{ consul_cacert }}
  cert: {{ consul_cert }}
  key: {{ consul_key }}

consul-samfurin

Don haka, mun zaɓi wurin ajiya don daidaitawa. Yanzu muna buƙatar fahimtar yadda PgBouncer zai canza tsarin sa yayin canza jagora a cikin gungu na Patroni. Babu amsar wannan tambaya a cikin takardun, saboda. a can, bisa manufa, ba a bayyana aiki tare da PgBouncer ba.

A cikin neman mafita, mun sami labarin (Ina da rashin alheri ban tuna da take ba) inda aka rubuta cewa Сonsul-template ya taimaka sosai wajen haɗa PgBouncer da Patroni. Wannan ya sa mu bincika yadda Consul-template ke aiki.

Ya bayyana cewa Consul-template koyaushe yana sa ido kan tsarin gungu na PostgreSQL a cikin Consul. Lokacin da jagora ya canza, yana sabunta tsarin PgBouncer kuma yana aika umarni don sake loda shi.

Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Babban ƙari na samfuri shi ne cewa an adana shi azaman lambar, don haka lokacin ƙara sabon shard, ya isa don yin sabon alƙawarin da sabunta samfur ɗin ta atomatik, yana goyan bayan Infrastructure azaman ka'idar lamba.

Sabon gine-gine tare da Patroni

A sakamakon haka, mun sami tsarin aiki mai zuwa:
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Duk sabobin aikace-aikacen suna samun dama ga ma'auni → akwai lokuta biyu na PgBouncer a baya → akan kowane misali, an ƙaddamar da Consul-template, wanda ke lura da matsayin kowane gungu na Patroni kuma yana lura da dacewa da tsarin PgBouncer, wanda ke aika buƙatun ga jagora na yanzu. na kowace gungu.

Gwajin hannu

Mun gudanar da wannan makirci kafin kaddamar da shi a kan karamin gwaji kuma mun duba aikin sauyawa ta atomatik. Suka buɗe allo, suka motsa sitikar, kuma a lokacin suka "kashe" shugaban ƙungiyar. A cikin AWS, wannan yana da sauƙi kamar rufe misalin ta hanyar na'ura mai kwakwalwa.

Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Alamar ta dawo baya cikin daƙiƙa 10-20, sannan kuma ta sake fara motsawa akai-akai. Wannan yana nufin cewa gungu na Patroni yayi aiki daidai: ya canza jagora, ya aika da bayanin zuwa Konsul, kuma Сonsul-template ya ɗauki wannan bayanin nan da nan, ya maye gurbin tsarin PgBouncer kuma ya aika da umarni don sake kunnawa.

Yadda za a tsira a ƙarƙashin babban nauyi kuma ku kiyaye ƙarancin lokacin raguwa?

Komai yana aiki daidai! Amma akwai sababbin tambayoyi: Ta yaya zai yi aiki a ƙarƙashin babban nauyi? Yadda za a yi sauri da aminci mirgine fitar da komai a cikin samarwa?

Yanayin gwajin da muke gudanar da gwajin lodi yana taimaka mana mu amsa tambaya ta farko. Ya yi kama da samarwa ta fuskar gine-gine kuma ya samar da bayanan gwaji wanda kusan daidai yake da girma zuwa samarwa. Mun yanke shawarar kawai "kashe" ɗaya daga cikin mashawartan PostgreSQL yayin gwajin kuma mu ga abin da ya faru. Amma kafin wannan, yana da mahimmanci don bincika mirgina ta atomatik, saboda a kan wannan yanayin muna da shards na PostgreSQL da yawa, don haka za mu sami kyakkyawan gwaji na rubutun sanyi kafin samarwa.

Dukansu ayyuka suna da buri, amma muna da PostgreSQL 9.6. Shin za mu iya haɓakawa nan da nan zuwa 11.2?

Mun yanke shawarar yin shi a cikin matakai 2: haɓakawa na farko zuwa 11.2, sannan ƙaddamar da Patroni.

PostgreSQL sabuntawa

Don sabunta sigar PostgreSQL da sauri, yi amfani da zaɓi -k, wanda aka ƙirƙiri hanyoyin haɗin kai akan faifai kuma babu buƙatar kwafin bayanan ku. A kan tushe na 300-400 GB, sabuntawar yana ɗaukar sakan 1.

Muna da shards da yawa, don haka sabuntawa yana buƙatar yin ta atomatik. Don yin wannan, mun rubuta littafin wasa mai yiwuwa wanda ke sarrafa mana gabaɗayan tsarin sabuntawa:

/usr/lib/postgresql/11/bin/pg_upgrade 
<b>--link </b>
--old-datadir='' --new-datadir='' 
 --old-bindir=''  --new-bindir='' 
 --old-options=' -c config_file=' 
 --new-options=' -c config_file='

Yana da mahimmanci a lura a nan cewa kafin fara haɓakawa, dole ne ku yi shi tare da siga --dubadon tabbatar da cewa za ku iya haɓakawa. Rubutun mu kuma yana yin maye gurbin saiti na tsawon lokacin haɓakawa. Rubutun mu ya ƙare a cikin daƙiƙa 30, wanda shine kyakkyawan sakamako.

Kaddamar da Patroni

Don magance matsala ta biyu, kawai duba tsarin Patroni. Wurin ajiya na hukuma yana da ƙayyadaddun misali tare da initdb, wanda ke da alhakin fara sabon bayanai lokacin da kuka fara Patroni. Amma tunda muna da shirye-shiryen bayanan da aka yi, kawai mun cire wannan sashe daga tsarin.

Lokacin da muka fara shigar da Patroni akan gungu na PostgreSQL da ya riga ya kasance kuma muna gudanar da shi, mun shiga cikin sabuwar matsala: duka sabobin sun fara ne a matsayin jagora. Patroni bai san komai ba game da farkon yanayin gungu kuma yayi ƙoƙarin fara sabobin biyu azaman gungu daban-daban tare da suna iri ɗaya. Don magance wannan matsalar, kuna buƙatar share kundin adireshi tare da bayanai akan bawan:

rm -rf /var/lib/postgresql/

Ana buƙatar yin wannan akan bawa kawai!

Lokacin da aka haɗa kwafi mai tsabta, Patroni yana yin jagorar tushe kuma ya mayar da shi zuwa kwafin, sannan ya kama yanayin halin yanzu bisa ga wal logs.

Wata wahala da muka ci karo da ita ita ce, duk gungu na PostgreSQL ana kiran su babba ta tsohuwa. Lokacin da kowane gungu bai san komai game da ɗayan ba, wannan al'ada ce. Amma lokacin da kake son amfani da Patroni, to duk gungu dole ne su sami suna na musamman. Maganin shine canza sunan tari a cikin tsarin PostgreSQL.

lodi gwajin

Mun ƙaddamar da gwaji wanda ke kwatanta ƙwarewar mai amfani akan allo. Lokacin da nauyin ya kai matsakaiciyar ƙimar mu ta yau da kullun, mun maimaita daidai gwajin iri ɗaya, mun kashe misali ɗaya tare da shugaban PostgreSQL. Rashin nasara ta atomatik yayi aiki kamar yadda muke tsammani: Patroni ya canza jagora, Consul-template ya sabunta tsarin PgBouncer kuma ya aika umarni don sake kunnawa. Dangane da jadawalinmu a Grafana, ya bayyana a sarari cewa akwai jinkiri na 20-30 seconds da ƙaramin adadin kurakurai daga sabar da ke da alaƙa da bayanan. Wannan lamari ne na al'ada, irin waɗannan dabi'un suna da karɓa don gazawar mu kuma tabbas sun fi ƙarancin sabis ɗin.

Kawo Patroni zuwa samarwa

A sakamakon haka, mun fito da tsari mai zuwa:

  • Sanya Consul-samfurin zuwa sabobin PgBouncer da ƙaddamarwa;
  • Sabuntawar PostgreSQL zuwa sigar 11.2;
  • Canja sunan gungu;
  • Ƙaddamar da Ƙwararrun Ƙwararru.

A lokaci guda kuma, makircinmu yana ba mu damar yin batu na farko kusan a kowane lokaci, za mu iya cire kowane PgBouncer daga aiki bi da bi da kuma turawa da gudanar da wani consul-template a kai. Haka muka yi.

Don saurin turawa, mun yi amfani da Mai yiwuwa, tunda mun riga mun gwada duk littattafan wasan kwaikwayo akan yanayin gwaji, kuma lokacin aiwatar da cikakken rubutun ya kasance daga mintuna 1,5 zuwa 2 ga kowane shard. Za mu iya fitar da komai bi da bi zuwa kowane shard ba tare da dakatar da sabis ɗinmu ba, amma dole ne mu kashe kowane PostgreSQL na mintuna da yawa. A wannan yanayin, masu amfani waɗanda bayanansu ke kan wannan shard ɗin ba za su iya cika aiki ba a wannan lokacin, kuma wannan ba shi da karɓa a gare mu.

Hanyar fita daga wannan yanayin shine tsarin kulawa da aka tsara, wanda ke faruwa a kowane watanni 3. Wannan taga ce don aikin da aka tsara, lokacin da muka rufe sabis ɗinmu gaba ɗaya kuma muka haɓaka bayanan bayanan mu. Saura mako guda sai taga na gaba, sai muka yanke shawarar jira kawai mu kara shiryawa. A cikin lokacin jira, mun kuma tabbatar da kanmu: ga kowane PostgreSQL shard, mun tayar da wani samfurin kwafi idan an gaza kiyaye sabbin bayanai, kuma mun ƙara sabon misali ga kowane shard, wanda yakamata ya zama sabon kwafi a cikin gungu na Patroni, don kar a aiwatar da umarnin share bayanai . Duk wannan ya taimaka wajen rage haɗarin kuskure.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Mun sake fara sabis ɗinmu, komai yayi aiki kamar yadda ya kamata, masu amfani sun ci gaba da aiki, amma a kan jadawali mun lura da babban nauyi a kan sabar Consul.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Me ya sa ba mu ga wannan a yanayin gwaji ba? Wannan matsala ta kwatanta da kyau cewa ya zama dole a bi Ka'idodin Kayayyakin Gida a matsayin ka'idar ka'ida da kuma daidaita dukkan abubuwan more rayuwa, daga wuraren gwaji zuwa samarwa. In ba haka ba, yana da sauƙin samun matsalar da muka samu. Me ya faru? Consul ya fara bayyana akan samarwa, sannan kuma akan yanayin gwaji, sakamakon haka, akan yanayin gwaji, sigar Consul ya fi girma akan samarwa. Kawai a cikin ɗaya daga cikin abubuwan da aka saki, an warware ɗigon CPU lokacin aiki tare da samfuri. Don haka, kawai mun sabunta Consul, don haka magance matsalar.

Sake kunna tarin Patroni

Duk da haka, mun sami wata sabuwar matsala, wacce ba mu ma zato ba. Lokacin sabunta Consul, kawai muna cire node na Consul daga gungu ta amfani da umarnin izinin ofishin → Patroni yana haɗi zuwa wani uwar garken Consul → duk abin yana aiki. Amma lokacin da muka isa misali na ƙarshe na cluster Consul kuma muka aika da wakilin izinin izinin zuwa gare shi, duk ƙungiyoyin Patroni sun sake farawa kawai, kuma a cikin rajistan ayyukan mun ga kuskure mai zuwa:

ERROR: get_cluster
Traceback (most recent call last):
...
RetryFailedError: 'Exceeded retry deadline'
ERROR: Error communicating with DCS
<b>LOG: database system is shut down</b>

Tarin Patroni ya kasa maido da bayani game da gunkinsa kuma ya sake farawa.

Don nemo mafita, mun tuntuɓi marubutan Patroni ta hanyar batu akan github. Sun ba da shawarar haɓakawa ga fayilolin tsarin mu:

consul:
 consul.checks: []
bootstrap:
 dcs:
   retry_timeout: 8

Mun sami damar maimaita matsalar akan yanayin gwaji kuma mun gwada waɗannan zaɓuɓɓukan a can, amma abin takaici ba su yi aiki ba.

Matsalar har yanzu ba a warware ba. Muna shirin gwada mafita masu zuwa:

  • Yi amfani da wakilin Consul akan kowane gungu na Patroni;
  • Gyara matsalar a cikin lambar.

Mun fahimci inda kuskuren ya faru: matsalar mai yiwuwa ita ce amfani da tsohowar lokacin ƙarewa, wanda ba a shafe shi ta hanyar fayil ɗin sanyi ba. Lokacin da aka cire uwar garken Consul na ƙarshe daga gungu, gabaɗayan ƙungiyar Consul suna rataye sama da daƙiƙa ɗaya, saboda wannan, Patroni ba zai iya samun matsayin gungu ba kuma ya sake farawa gabaɗaya tarin.

Abin farin ciki, ba mu sake cin karo da kurakurai ba.

Sakamakon amfani da Patroni

Bayan nasarar ƙaddamar da Patroni, mun ƙara ƙarin kwafi a cikin kowane gungu. Yanzu a cikin kowane gungu akwai kamannin ƙididdiga: jagora ɗaya da kwafi biyu, don amintaccen net ɗin idan akwai tsaga-kwakwalwa lokacin canzawa.
Failover Cluster PostgreSQL + Patroni. Kwarewar aiwatarwa

Patroni yana aiki akan samarwa sama da watanni uku. A wannan lokacin, ya riga ya yi nasarar taimaka mana. Kwanan nan, jagoran ɗaya daga cikin gungu ya mutu a cikin AWS, gazawar atomatik ya yi aiki kuma masu amfani sun ci gaba da aiki. Patroni ya cika babban aikinsa.

Ƙananan taƙaitaccen amfani da Patroni:

  • Sauƙin daidaitawa canje-canje. Ya isa a canza saitin a wani misali kuma za a ja shi har zuwa ga duka tari. Idan ana buƙatar sake yi don amfani da sabon saitin, to Patroni zai sanar da ku. Patroni na iya sake kunna tari duka tare da umarni ɗaya, wanda kuma ya dace sosai.
  • Rashin gazawar atomatik yana aiki kuma ya riga ya sami nasarar taimaka mana.
  • Sabunta PostgreSQL ba tare da rage lokacin aikace-aikacen ba. Dole ne ku fara sabunta kwafin zuwa sabon sigar, sannan ku canza jagora a cikin gungun Patroni kuma sabunta tsohon jagora. A wannan yanayin, gwajin da ake buƙata na gazawar atomatik yana faruwa.

source: www.habr.com

Add a comment