I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Umgomo oyinhloko we-Patroni ukuhlinzeka ngokutholakala Okuphezulu kwe-PostgreSQL. Kodwa i-Patroni iyisifanekiso nje, hhayi ithuluzi elenziwe ngomumo (okuyinto, ngokuvamile, eshiwo emibhalweni). Uma uthi nhlΓ‘, ngemva kokusetha i-Patroni kulebhu yokuhlola, ungabona ukuthi iyithuluzi elihle kangakanani nokuthi liyiphatha kalula kangakanani imizamo yethu yokuphula iqoqo. Kodwa-ke, ekusebenzeni, endaweni yokukhiqiza, konke akwenzeki njalo ngobuhle nangobuhle njengaselebhu yokuhlola.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngizokutshela kancane ngami. Ngaqala njengomphathi wesistimu. Isebenze ekuthuthukisweni kwewebhu. Bengisebenza e-Data Egret kusukela ngo-2014. Inkampani ibambe iqhaza ekubonisaneni emkhakheni we-Postgres. Futhi sisebenzela ama-Postgres ngqo, futhi sisebenza nama-Postgres nsuku zonke, ngakho-ke sinolwazi oluhlukile oluhlobene nokusebenza.

Futhi ekupheleni kuka-2018, saqala ukusebenzisa kancane kancane i-Patroni. Futhi okunye okuhlangenwe nakho kuye kwanqwabelana. Sayixilonga ngandlela thize, sayishuna, safika emikhubeni yethu ehamba phambili. Futhi kulo mbiko ngizokhuluma ngabo.

Ngaphandle kwe-Postgres, ngiyayithanda i-Linux. Ngithanda ukuzulazula kuyo futhi ngihlole, ngithanda ukuqoqa ama-cores. Ngithanda i-virtualization, iziqukathi, i-docker, i-Kubernetes. Konke lokhu kuyangithakasela, ngoba imikhuba yakudala yo admin iyathinta. Ngithanda ukubhekana nokuqapha. Futhi ngiyazithanda izinto ze-postgres ezihlobene nokuphatha, okungukuthi ukuphindaphinda, ukwenza isipele. Futhi ngesikhathi sami sokuphumula ngibhala elithi Go. Angiyena unjiniyela wesoftware, ngivele ngizibhalele ku-Go. Futhi kuyangijabulisa.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

  • Ngicabanga ukuthi abaningi benu bayazi ukuthi i-Postgres ayinayo i-HA (Ukutholakala Okuphezulu) ngaphandle kwebhokisi. Ukuze uthole i-HA, udinga ukufaka okuthile, ukulungise, wenze umzamo futhi ukuthole.
  • Kunamathuluzi amaningana futhi uPatroni ungomunye wawo oxazulula i-HA epholile futhi kahle kakhulu. Kodwa ngokufaka konke kulabhu yokuhlola nokuyisebenzisa, singabona ukuthi konke kuyasebenza, singakwazi ukukhiqiza ezinye izinkinga, sibone ukuthi uPatroni ubakhonza kanjani. Futhi sizobona ukuthi konke kusebenza kahle.
  • Kodwa empeleni, sasibhekana nezinkinga ezihlukahlukene. Futhi ngizokhuluma ngalezi zinkinga.
  • Ngizokutshela ukuthi sikuhlonze kanjani, lokho esikulungisile - ukuthi kusisizile noma cha.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

  • Ngeke ngikutshele indlela yokufaka i-Patroni, ngoba ungakwazi i-google ku-intanethi, ungabheka amafayela wokucushwa ukuze uqonde ukuthi konke kuqala kanjani, ukuthi kuhlelwe kanjani. Ungakwazi ukuqonda izikimu, izakhiwo, ukuthola ulwazi ngakho kuyi-Internet.
  • Ngeke ngikhulume ngesipiliyoni somunye umuntu. Ngizokhuluma ngezinkinga esibhekane nazo kuphela.
  • Futhi ngeke ngikhulume ngezinkinga ezingaphandle kwe-Patroni ne-PostgreSQL. Uma, isibonelo, kunezinkinga ezihambisana nokulinganisa, lapho iqoqo lethu liwile, ngeke ngikhulume ngakho.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Nokuziqhathulula okuncane ngaphambi kokuthi siqale umbiko wethu.

Zonke lezi zinkinga esihlangabezane nazo, saba nazo ezinyangeni zokuqala ze-6-7-8 zokusebenza. Ngokuhamba kwesikhathi, safika emikhubeni yethu ehamba phambili yangaphakathi. Nezinkinga zethu zanyamalala. Ngakho-ke, umbiko wamenyezelwa cishe ezinyangeni eziyisithupha ezedlule, lapho konke kwase kusha ekhanda lami futhi ngakhumbula konke ngokuphelele.

Ngesikhathi ngilungisa umbiko, sengivele ngaphakamisa ama-postmortem amadala, ngabheka izingodo. Futhi eminye imininingwane ingase ikhohliwe, noma eminye imininingwane ayikwazanga ukuphenywa ngokugcwele ngesikhathi sokuhlaziywa kwezinkinga, ngakho-ke ngezinye izikhathi kungase kubonakale sengathi izinkinga azicatshangelwa ngokugcwele, noma kukhona ukuntuleka kolwazi. Futhi ngakho ngicela ukuthi ungixolele ngalesi sikhathi.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Uyini uPatroni?

  • Lesi isifanekiso sokwakha i-HA. Yilokho elikushoyo embhalweni. Futhi ngokombono wami, lokhu ukucaciswa okunembile kakhulu. I-Patroni ayiyona inhlamvu yesiliva ezoxazulula zonke izinkinga zakho, okungukuthi, udinga ukwenza umzamo wokuwenza futhi ulethe izinzuzo.
  • Lena isevisi ye-ejenti efakwe kuyo yonke isevisi yesizindalwazi futhi iwuhlobo lwesistimu ye-init ye-Postgres yakho. Iqala ama-Postgres, iyama, iqala kabusha, ilungisa kabusha, futhi ishintsha i-topology yeqoqo lakho.
  • Ngokuvumelana nalokho, ukuze kugcinwe isimo seqoqo, ukumelwa kwayo kwamanje, njengoba kubukeka, uhlobo oluthile lwesitoreji luyadingeka. Futhi kusukela kulo mbono, uPatroni wathatha indlela yokugcina isimo ohlelweni lwangaphandle. Iwuhlelo lokulondoloza lokucushwa olusabalalisiwe. Kungaba i-Etcd, Consul, ZooKeeper, noma i-kubernetes Etcd, i.e. enye yalezi zinketho.
  • Futhi esinye sezici ze-Patroni ukuthi uthola i-autofiler ngaphandle kwebhokisi, kuphela ngokuyimisa. Uma sithatha i-Repmgr ukuze siyiqhathanise, khona-ke ifayela lifakiwe lapho. Nge-Repmgr, sithola i-switchover, kodwa uma sifuna i-autofiler, kufanele siyilungiselele futhi. U-Patroni usevele une-autofiler ngaphandle kwebhokisi.
  • Futhi kukhona nezinye izinto eziningi. Isibonelo, ukugcinwa kokucushwa, ukuthulula ama-replicas amasha, ikhophi yasenqolobaneni, njll. Kodwa lokhu kungaphezu kobubanzi bombiko, ngeke ngikhulume ngakho.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi umphumela omncane ukuthi umsebenzi oyinhloko we-Patroni ukwenza i-autofile kahle futhi enokwethenjelwa ukuze iqoqo lethu lihlale lisebenza futhi uhlelo lokusebenza aluboni izinguquko ku-topology yeqoqo.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kodwa uma siqala ukusebenzisa i-Patroni, uhlelo lwethu luba nzima nakakhulu. Uma ngaphambili sasine-Postgres, ngakho-ke lapho sisebenzisa i-Patroni sithola i-Patroni ngokwayo, sithola i-DCS lapho isimo sigcinwa khona. Futhi konke kufanele kusebenze ngandlela thize. Manje yini engase yonakale?

Kungenzeka ikhefu:

  • Ama-postgres angase aphuke. Kungaba master noma i-replica, eyodwa yazo ingase yehluleke.
  • I-Patroni ngokwayo ingase iphule.
  • I-DCS lapho isifunda sigcinwa khona ingase iphuke.
  • Futhi inethiwekhi ingaphuka.

Wonke la maphuzu ngizowacubungula embikweni.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngizobheka amacala njengoba eba yinkimbinkimbi, hhayi ngokombono wokuthi icala lihilela izingxenye eziningi. Futhi ngokombono wemizwa subjective, ukuthi leli cala kwakunzima kimi, kwakunzima ukuliqaqa ... futhi okuphambene nalokho, icala elithile lalilula futhi kwakulula ukuliqaqa.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi icala lokuqala lilula kakhulu. Kunjalo lapho sithatha iqoqo ledathabhesi futhi sisebenzisa isitoreji sethu se-DCS kuqoqo elifanayo. Leli iphutha elivame kakhulu. Lokhu kuyiphutha ekwakheni izakhiwo, okungukuthi, ukuhlanganisa izingxenye ezahlukene endaweni eyodwa.

Ngakho-ke, kwakukhona umfayila, ake siyobhekana nokwenzekile.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi lapha sinentshisekelo yokuthi ifayela lenzeke nini. Okusho ukuthi, sinentshisekelo kulo mzuzu ngesikhathi lapho isimo seqoqo sishintsha.

Kodwa ifayili ayihlali ishesha ngaso sonke isikhathi, okungukuthi ayithathi noma iyiphi iyunithi yesikhathi, ingabambezeleka. Kungathatha isikhathi eside.

Ngakho-ke, inesikhathi sokuqala nesikhathi sokuphela, okungukuthi umcimbi oqhubekayo. Futhi sihlukanisa yonke imicimbi ibe yizikhawu ezintathu: sinesikhathi ngaphambi komfayila, ngesikhathi sokufayela nangemva kwefayela. Okusho ukuthi, sicabangela zonke izehlakalo kulo mugqa wesikhathi.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi into yokuqala, uma kwenzeka ifayili, sibheka imbangela yalokho okwenzekile, yini imbangela yalokho okuholele kufayela.

Uma sibheka izingodo, zizoba izingodo ze-Patroni zakudala. Usitshela kubo ukuthi iseva isibe yinkosi, futhi indima yenkosi idlulile kule node. Lapha kugqanyisiwe.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Okulandelayo, sidinga ukuqonda ukuthi kungani ifayela lenzekile, okungukuthi iziphi izehlakalo ezenzekile ezibangele ukuthi indima eyinhloko isuke endaweni eyodwa iye kwenye. Futhi kulokhu, konke kulula. Sinephutha ekuxhumaneni nesistimu yesitoreji. Inkosi yaqaphela ukuthi yayingakwazi ukusebenza ne-DCS, okungukuthi, kwakukhona uhlobo oluthile lwenkinga ngokusebenzisana. Futhi uthi ngeke esakwazi ukuba yinkosi asule. Lo mugqa "ozehlisayo" usho lokho kanye.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Uma sibheka izehlakalo ezandulela lowo ofake ifayela, singabona khona zona lezi zizathu ebeziyinkinga yokuqhubeka kwewizadi.

Uma sibheka izingodo zePatroni, sizobona ukuthi sinamaphutha amaningi, ukuphela kwesikhathi, okungukuthi i-agent ye-Patroni ayikwazi ukusebenza ne-DCS. Kulokhu, lena i-ejenti ye-Consul, exhumana ne-port 8500.

Futhi inkinga lapha ukuthi i-Patroni kanye ne-database isebenza kumsingathi ofanayo. Futhi amaseva we-Consul wethulwa endaweni efanayo. Ngokudala umthwalo kuseva, sidale izinkinga kumaseva we-Consul. Abakwazanga ukuxhumana kahle.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngemva kwesikhathi esithile, lapho umthwalo ubohla, uPatroni wethu wakwazi futhi ukuxhumana namanxusa. Umsebenzi ojwayelekile uqalwe kabusha. Futhi iseva efanayo ye-Pgdb-2 yaba inkosi futhi. Okusho ukuthi, kwakukhona i-flip encane, ngenxa yokuthi i-node yasula amandla enkosi, yabe isiwathatha futhi, okungukuthi, yonke into ibuyele njengoba yayinjalo.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi lokhu kungabhekwa njenge-alamu yamanga, noma kungabhekwa ukuthi uPatroni wenza konke okulungile. Okungukuthi, wabona ukuthi wayengakwazi ukugcina isimo seqoqo futhi wasusa igunya lakhe.

Futhi lapha inkinga yavela ngenxa yokuthi amaseva e-Consul akwi-hardware efanayo nezisekelo. Ngokuvumelana nalokho, noma yimuphi umthwalo: noma ngabe umthwalo kumadiski noma ama-processor, futhi kuthinta ukuxhumana neqoqo le-Consul.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi sanquma ukuthi akufanele ihlale ndawonye, ​​sabe iqoqo elihlukile le-Consul. Futhi uPatroni wayesevele esebenza ne-Consul ehlukile, okungukuthi, kwakukhona iqoqo le-Postgres elihlukile, iqoqo le-Consul elihlukile. Lesi yisiyalo esiyisisekelo sendlela yokuthwala nokugcina zonke lezi zinto ukuze zingahlali ndawonye.

Njengenketho, ungakwazi ukusonta amapharamitha we-ttl, loop_wait, retry_timeout, okungukuthi, zama ukusinda kulezi ziqongo zokulayisha zesikhashana ngokwandisa lezi zinhlaka. Kodwa lokhu akuyona inketho efanelekile kakhulu, ngoba lo mthwalo ungaba isikhathi eside. Futhi sizomane sidlulele ngale kwemikhawulo yale mingcele. Futhi lokho kungase kungasizi ngempela.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Inkinga yokuqala, njengoba uqonda, ilula. Sithathe futhi sahlanganisa i-DCS nesisekelo, sithole inkinga.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Inkinga yesibili iyafana neyokuqala. Kuyafana nokuthi siphinde sibe nezinkinga zokusebenzisana nohlelo lwe-DCS.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Uma sibheka izingodo, sizobona ukuthi siphinde sibe nephutha lokuxhumana. Futhi u-Patroni uthi angikwazi ukusebenzisana ne-DCS ukuze inkosi yamanje ingene kumodi yokukopisha.

Inkosi endala iba isifaniso, lapha uPatroni usebenza, njengoba kufanele. Isebenzisa i-pg_rewind ukuze ibuyisele emuva ilogu yokwenziwe bese ixhumeka kumphathi omusha ukuze ubambe okuyinhloko okusha. Lapha uPatroni usebenza, njengoba kufanele.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Lapha kufanele sithole indawo eyandulela okokufayela, okungukuthi lawo maphutha abangele ukuthi sibe nesifayeli. Futhi kulokhu, izingodo ze-Patroni zilula kakhulu ukusebenza nazo. Ubhala imiyalezo efanayo ngesikhathi esithile. Futhi uma siqala ukupheqa lezi zingodo ngokushesha, khona-ke sizobona kusukela ezingodweni ukuthi izingodo zishintshile, okusho ukuthi ezinye izinkinga seziqalile. Ngokushesha sibuyela kule ndawo, sibone ukuthi kwenzekani.

Futhi esimweni esivamile, izingodo zibukeka kanjena. Umnikazi wesikhiya uhloliwe. Futhi uma umnikazi, isibonelo, eshintshile, khona-ke ezinye izenzakalo zingase zenzeke uPatroni okufanele aphendule kuzo. Kodwa kulokhu, silungile. Sibheka indawo lapho amaphutha aqale khona.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi ngemva kokuskrola kwaze kwaba seqophelweni lapho amaphutha aqala ukuvela khona, siyabona ukuthi sibe ne-auto-fileover. Futhi njengoba amaphutha ethu ayehlobene nokusebenzelana ne-DCS futhi esimweni sethu sasebenzisa i-Consul, siphinde sibheke izingodo ze-Consul, okwenzeka lapho.

Uma siqhathanisa isikhathi somfaki-fayela kanye nesikhathi esiku-Consul logs, siyabona ukuthi omakhelwane bethu ku-Consul cluster baqala ukungabaza ukuba khona kwamanye amalungu e-Consul cluster.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi uma ubheka nezingodo zamanye ama-Consul agents, ungabona futhi ukuthi uhlobo oluthile lokuwa kwenethiwekhi okwenzeka lapho. Futhi wonke amalungu e-Consul cluster ayangabaza ubukhona bomunye nomunye. Futhi lokhu kwaba umfutho wefayela.

Uma ubheka okwenzeka ngaphambi kwalawa maphutha, ungabona ukuthi kukhona zonke izinhlobo zamaphutha, isibonelo, umnqamulajuqu, i-RPC iwile, okungukuthi, kukhona ngokucacile uhlobo oluthile lwenkinga ekusebenzisaneni kwamalungu eqoqo le-Consul nomunye nomunye. .

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Impendulo elula ukulungisa inethiwekhi. Kodwa kimina, ngimi emsamo, kulula ukukusho lokhu. Kodwa izimo zinjalo kangangokuthi ikhasimende alikwazi njalo ukulungisa inethiwekhi. Angase ahlale e-DC futhi angakwazi ukulungisa inethiwekhi, athinte imishini. Ngakho-ke ezinye izinketho ziyadingeka.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kunezinketho:

  • Inketho elula, ebhaliwe, ngokubona kwami, ngisho nasemibhalweni, ukukhubaza amasheke e-Consul, okungukuthi, vele udlulise uhlu olungenalutho. Futhi sitshela i-Consul agent ukuthi ingasebenzisi noma yimaphi amasheke. Ngalokhu kuhlola, singaziba lezi zivunguvungu zenethiwekhi futhi singaqalisi okokufayela.
  • Enye inketho ukuhlola kabili okuthi raft_multiplier. Lena ipharamitha yeseva ye-Consul ngokwayo. Ngokuzenzakalelayo, isethelwe ku-5. Leli nani lituswa imibhalo yezindawo zesiteji. Eqinisweni, lokhu kuthinta imvamisa yemiyalezo phakathi kwamalungu enethiwekhi ye-Consul. Eqinisweni, le parameter ithinta isivinini sokuxhumana kwesevisi phakathi kwamalungu eqoqo le-Consul. Futhi ekukhiqizeni, sekuvele kunconywa ukuyinciphisa ukuze ama-node ashintshane imilayezo kaningi.
  • Enye inketho esiqhamuke nayo ukukhulisa ukubaluleka kwezinqubo ze-Consul phakathi kwezinye izinqubo zomhleli wenqubo yesistimu yokusebenza. Kukhona ipharamitha β€œenhle” enjalo, ivele inqume ukubaluleka kwezinqubo ezicatshangelwa umhleli we-OS lapho ehlela. Siphinde sehlise inani elihle lama-Consul agents, i.e. kwenyusa okubalulekile ukuze uhlelo lokusebenza lunikeze izinqubo ze-Consul isikhathi esengeziwe sokusebenza nokwenza ikhodi yazo. Esimweni sethu, lokhu kwaxazulula inkinga yethu.
  • Enye inketho ukungasebenzisi i-Consul. Nginomngane ongisekela kakhulu Etcd. Futhi sihlala siphikisana naye ukuthi yikuphi okungcono Etcd noma Consul. Kodwa mayelana nokuthi yikuphi okungcono, sivame ukuvumelana naye ukuthi i-Consul ine-ejenti okufanele isebenze endaweni ngayinye ene-database. Okusho ukuthi, ukusebenzisana kukaPatroni neqoqo le-Consul kuhamba ngale ejenti. Futhi le ejenti iba ibhodlela. Uma kwenzeka okuthile kumenzeli, u-Patroni ngeke esakwazi ukusebenza neqoqo le-Consul. Futhi lena inkinga. Akekho umenzeli ohlelweni lwe-Etcd. I-Patroni ingasebenza ngokuqondile nohlu lwamaseva e-Etcd futhi isivele ixhumane nawo. Mayelana nalokhu, uma usebenzisa i-Etcd enkampanini yakho, i-Etcd cishe izoba yisinqumo esingcono kune-Consul. Kodwa thina kumakhasimende ethu sihlala silinganiselwe yilokho iklayenti elikukhethile nelikusebenzisayo. Futhi sine-Consul ingxenye enkulu yawo wonke amaklayenti.
  • Futhi iphuzu lokugcina ukubuyekeza amanani epharamitha. Singakwazi ukuphakamisa le mingcele ngethemba lokuthi izinkinga zethu zenethiwekhi zesikhashana zizoba zifushane futhi zingaweli ngaphandle kwebanga lale mingcele. Ngale ndlela singehlisa ulaka lwe-Patroni ukuze luzifake ngokuzenzakalela uma kwenzeka ezinye izinkinga zenethiwekhi.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngicabanga ukuthi abaningi abasebenzisa iPatroni bayawazi lo myalo.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Lo myalo ubonisa isimo samanje seqoqo. Futhi ekuqaleni, lesi sithombe singase sibonakale sijwayelekile. Sinompetha, sine-replica, akukho ukuphindaphinda. Kodwa lesi sithombe sijwayelekile kuze kube yilapho sesazi ukuthi leli qoqo kufanele libe namanodi amathathu, hhayi amabili.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngokuvumelana nalokho, kwakukhona i-autofile. Futhi ngemva kwaleli fayela elizenzakalelayo, isifaniso sethu sanyamalala. Sidinga ukuthola ukuthi kungani enyamalele futhi simbuyise, simbuyisele. Futhi siphinde siye ezingodweni futhi sibone ukuthi kungani sibe ne-auto-fileover.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kulokhu, i-replica yesibili yaba inkosi. Kulungile lapha.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi sidinga ukubheka umfanekiso owawa futhi ongekho kuqoqo. Sivula izingodo zePatroni futhi sibone ukuthi sibe nenkinga phakathi nenqubo yokuxhuma ku-cluster esiteji se-pg_rewind. Ukuze uxhume ku-cluster, udinga ukuhlehlisa ilogu yokwenziwayo, ucele ilogu yokwenziwayo edingekayo kumphathi, futhi ulisebenzisele ukubamba umnikazi.

Kulokhu, asinalo ilogu yokwenziwe futhi isifaniso asikwazi ukuqala. Ngokufanelekile, simisa i-Postgres ngephutha. Ngakho-ke ayikho ku-cluster.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kudingeka siqonde ukuthi kungani ingekho ku-cluster nokuthi kungani zingekho izingodo. Siya kumphathi omusha sibheke ukuthi unani ezingodweni. Kuvele ukuthi lapho pg_rewind yenziwa, indawo yokuhlola yenzeka. Futhi amanye amalogi okwenziwayo amadala aqanjwa kabusha nje. Lapho inkosi endala izama ukuxhuma kumphathi omusha futhi ibuza lezi zingodo, zase ziqanjwe kabusha kakade, zazingekho nje.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngiqhathanise izitembu zesikhathi lapho le micimbi yenzeka. Futhi lapho umehluko empeleni ngama-millisecond angu-150, okungukuthi, indawo yokuhlola eqediwe ngama-millisecond angu-369, izingxenye ze-WAL zaqanjwa kabusha. Futhi ngokoqobo ngo-517, ngemva kwama-millisecond angu-150, ukuhlehlisa emuva kwaqala kumfanekiso omdala. Okusho ukuthi, ama-millisecond angu-150 abenele kithi ukuze isifaniso singakwazi ukuxhuma futhi sizuze.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Yiziphi izinketho?

Ekuqaleni sasebenzisa izikhala zokuphindaphinda. Besicabanga ukuthi kuhle. Nakuba esigabeni sokuqala sokusebenza sacisha izikhala. Kithina kwakubonakala sengathi uma izikhala ziqongelela izingxenye eziningi ze-WAL, singakwazi ukulahla umaster. Uzowa. Sahlupheka isikhathi eside ngaphandle kwezikhala. Futhi sabona ukuthi sidinga izikhala, sabuyisela izikhala.

Kodwa kunenkinga lapha, ukuthi uma inkosi iya ku-replica, isusa izikhala futhi isuse izingxenye ze-WAL kanye nezikhala. Futhi ukuqeda le nkinga, sinqume ukuphakamisa ipharamitha ye-wal_keep_segments. Ishintsha ibe yiziqephu ezingu-8. Siyikhuphule yaba ngu-1 000 sabheka ukuthi singakanani isikhala esikhululekile. Futhi sinikele ngamagigabhayithi angu-16 kuma-wal_keep_segments. Okusho ukuthi, lapho sishintsha, sihlala sinendawo yokugcina ye-16 gigabytes yamalogi okuthenga kuwo wonke ama-node.

Futhi plus - kusabalulekile emisebenzini yesikhathi eside yokulungisa. Ake sithi sidinga ukubuyekeza enye yamakhophi. Futhi sifuna ukuyivala. Kudingeka sibuyekeze isofthiwe, mhlawumbe isistimu yokusebenza, enye into. Futhi uma sicisha isifaniso, imbobo yaleso sifaniso nayo iyasuswa. Futhi uma sisebenzisa ama-wal_keep_segments amancane, khona-ke ngokungabi bikho isikhathi eside kwe-replica, amalogi okwenziwayo azolahleka. Sizophakamisa isifaniso, sizocela lawo malogi okwenziwayo lapho ame khona, kodwa angahle angabi kumaster. Futhi i-replica ngeke ikwazi ukuxhuma nayo. Ngakho-ke, sigcina isitokwe esikhulu somagazini.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Sinesizinda sokukhiqiza. Sekuvele kukhona amaphrojekthi aqhubekayo.

Kwakukhona umfayeli. Sangena sabheka - yonke into ihlelekile, ama-replicas akhona, akukho ukuphindaphinda. Awekho amaphutha nasezingodweni, konke kumi ngononina.

Ithimba lomkhiqizo lithi kufanele kube nedatha ethile, kodwa siyibona emthonjeni owodwa, kodwa asiyiboni ku-database. Futhi kudingeka siqonde ukuthi kwenzekani kubo.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kuyacaca ukuthi pg_rewind ubageje. Sakuqonda ngokushesha lokhu, kodwa saya ukuyobona ukuthi kwenzekani.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ezingodweni, singathola njalo lapho umfaki wefayela kwenzeka, ngubani owaba umpetha, futhi singakwazi ukunquma ukuthi ubani owayeyinkosi endala futhi lapho efuna ukuba yi-replica, okungukuthi sidinga lezi zingodi ukuze sithole inani lamalogi okwenziwayo ilahlekile.

Umphathi wethu omdala uqalise kabusha. Futhi uPatroni wabhaliswa ku-autorun. Kwethulwa uPatroni. Wabe eseqala iPostgres. Ngokunembile, ngaphambi kokuqala i-Postgres nangaphambi kokuyenza ikhophi, u-Patroni wethule inqubo ye-pg_rewind. Ngokuvumelana nalokho, usule ingxenye yamalogi omsebenzi, walanda amasha futhi waxhuma. Lapha uPatroni wasebenza ngobuhlakani, okungukuthi, njengoba bekulindelekile. Iqoqo libuyiselwe. Sibe nama-node ama-3, ngemuva kwe-filer 3 node - konke kupholile.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Silahlekelwe idatha ethile. Futhi kudingeka siqonde ukuthi silahlekelwe kangakanani. Sibheke isikhathi lapho siphindele khona. Singakuthola kulokho okufakwe kujenali. Ukuhlehlisa emuva kwaqala, kwenza okuthile lapho futhi kwaphetha.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Sidinga ukuthola indawo kulogu yokwenziwayo lapho umphathi omdala eshiye khona. Kulokhu, lolu uphawu. Futhi sidinga imaki lesibili, okungukuthi, ibanga inkosi endala ehluka ngalo entsha.

Sithatha pg_wal_lsn_diff evamile futhi siqhathanise lawa mamaki amabili. Futhi kulokhu, sithola 17 megabytes. Kakhulu noma kancane, wonke umuntu uyazinqumela. Ngoba kumuntu i-17 megabytes ayiningi, kumuntu kuningi futhi akwamukeleki. Lapha, umuntu ngamunye uzinqumela yena ngokuhambisana nezidingo zebhizinisi.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kodwa yini esizitholele yona thina?

Okokuqala, kufanele sizinqumele - ingabe sihlala sidinga u-Patroni ukuthi aqalise ngokuzenzakalela ngemva kokuqaliswa kabusha kwesistimu? Kuyaye kwenzeke ukuthi siye kumphathi omdala, sibone ukuthi usehambe kangakanani. Mhlawumbe hlola izingxenye zelogi yokwenziwe, ubone ukuthi yini elapho. Nokuqonda ukuthi ingabe singalahlekelwa yile datha noma sidinga ukuqalisa okuyinhloko ngemodi ezimele ukuze sikhiphe le datha.

Futhi kuphela ngemva kwalokho kufanele sinqume ukuthi ingabe singayilahla le datha noma singayibuyisela, sixhume le nodi njengesifaniso kuqoqo lethu.

Ngaphezu kwalokho, kukhona ipharamitha ethi "maximum_lag_on_failover". Ngokuzenzakalelayo, uma inkumbulo yami ingisebenzela, le parameter inenani elingu-1 megabyte.

Usebenza kanjani? Uma isifaniso sethu singemuva nge-megabyte engu-1 yedatha ekubambezelekeni kokuphindaphinda, lokhu kusho ukuthi lesi sifaniso asibambi iqhaza okhethweni. Futhi uma ngokuzumayo kukhona i-fileover, u-Patroni ubheka ukuthi yiziphi izifanekiso ezisalele ngemuva. Uma bengemuva ngenani elikhulu lamalogi okwenziwayo, abakwazi ukuba umpetha. Lesi isici esihle kakhulu sokuvikela esikuvikela ekulahlekelweni yidatha eningi.

Kodwa kunenkinga yokuthi i-replication lag kuqoqo le-Patroni kanye ne-DCS ibuyekezwa ngesikhathi esithile. Ngicabanga ukuthi imizuzwana engama-30 iyinani elizenzakalelayo le-ttl.

Ngokuvumelana nalokho, kungase kube nesimo lapho kukhona ukuphindaphinda okukodwa kwe-replicas ku-DCS, kodwa empeleni kungase kube ne-lag ehluke ngokuphelele noma kungase kungabi khona nhlobo, okungukuthi le nto akusona isikhathi sangempela. Futhi ayibonisi ngaso sonke isikhathi isithombe sangempela. Futhi akufanelekile ukwenza i-logic enengqondo kuyo.

Futhi ingozi yokulahlekelwa ihlala ihlala njalo. Futhi esimweni esibi kakhulu, ifomula eyodwa, futhi esimweni esijwayelekile, enye ifomula. Okusho ukuthi, lapho sihlela ukuqaliswa kwe-Patroni futhi sihlola ukuthi ingakanani idatha esingase siyilahle, kufanele sithembele kulawa mafomula futhi sicabange ukuthi ingakanani idatha esingase siyilahlekele.

Futhi kukhona izindaba ezinhle. Lapho inkosi endala isihambile, ingaqhubeka ngenxa yezinqubo ezithile zasemuva. Okusho ukuthi, kwakukhona uhlobo oluthile lwe-autovacuum, wabhala idatha, wayilondoloza kulogi yokuthengiselana. Futhi singaziba kalula futhi silahlekelwe yile datha. Ayikho inkinga kulokhu.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi yile ndlela amalogi abukeka ngayo uma i-maximum_lag_on_failover isethiwe futhi ifayela lenzekile, futhi udinga ukukhetha okuyinhloko okusha. I-replica izihlola ukuthi ayikwazi ukubamba iqhaza okhethweni. Futhi uyenqaba ukuhlanganyela emncintiswaneni womholi. Futhi ulinda ukuthi kukhethwe inkosi entsha, ukuze akwazi ukuxhuma kuyo. Lesi isilinganiso esingeziwe ngokumelene nokulahleka kwedatha.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Lapha sineqembu lomkhiqizo elibhale ukuthi umkhiqizo wabo unezinkinga nge-Postgres. Ngesikhathi esifanayo, inkosi ngokwayo ayikwazi ukufinyelelwa, ngoba ayitholakali nge-SSH. Futhi okuzenzakalelayo akwenzeki futhi.

Lo msingathi uphoqeleke ukuthi aqalise kabusha. Ngenxa yokuqaliswa kabusha, ifayela elizenzakalelayo lenzekile, nakuba bekungenzeka ukwenza ifayela elizenzakalelayo lemanuwali, njengoba sengiqonda manje. Futhi ngemva kokuqalisa kabusha, sesivele sizobona lokho esasinakho ngenkosi yamanje.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngesikhathi esifanayo, sazi kusengaphambili ukuthi sinezinkinga ngamadiski, okungukuthi, sase sazi kakade ngokuqapha lapho kufanele bambe khona nokuthi yini okufanele siyibheke.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Sangena e-postgres log, saqala ukubona ukuthi kwenzekani lapho. Sibone ukwenza okokugcina lapho isekhondi elilodwa, amabili, amathathu, okuyinto engejwayelekile neze. Sibonile ukuthi i-autovacuum yethu iqala kancane futhi ngokumangazayo. Futhi sibone amafayela esikhashana kudiski. Okusho ukuthi, lezi zonke ziyizinkomba zezinkinga ngamadiski.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Sibheke kusistimu dmesg (kernel log). Futhi sibonile ukuthi sinezinkinga ngeyodwa yamadiski. I-subsystem yediski bekuyi-software Raid. Sibheke ku/proc/mdstat futhi sabona ukuthi besishoda ngedrayivu eyodwa. Okusho ukuthi, kukhona iRaid of 8 disks, sishoda eyodwa. Uma ubheka ngokucophelela isilayidi, khona-ke ekuphumeni ungabona ukuthi asinayo i-sde lapho. Kithina, ngokwemibandela, idiski isiphumile. Lokhu kubangele izinkinga zediski, futhi izinhlelo zokusebenza nazo zibe nezinkinga lapho zisebenza neqoqo le-Postgres.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi kulokhu, uPatroni wayengeke asisize nganoma iyiphi indlela, ngoba uPatroni akanalo umsebenzi wokuqapha isimo seseva, isimo sediski. Futhi kufanele siqaphe izimo ezinjalo ngokuqapha kwangaphandle. Ngokushesha sengeze ukuqapha kwediski ekuqapheni kwangaphandle.

Futhi kwakukhona umcabango onjalo - ingabe ukubiya noma isofthiwe ye-watchdog ingasisiza? Sasicabanga ukuthi wayengeke asisize kuleli cala, ngoba phakathi nezinkinga uPatroni waqhubeka nokuxhumana neqoqo le-DCS futhi akazange abone nkinga. Okungukuthi, kusukela ekubukeni kwe-DCS ne-Patroni, konke kwakuhamba kahle nge-cluster, nakuba empeleni kwakukhona izinkinga nge-disk, kwakukhona izinkinga ngokutholakala kwedatha.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngokubona kwami, lokhu kungenye yezinkinga ezixakile engizicwaninge isikhathi eside kakhulu, ngifunde izingodo eziningi, ngaphinda ngathatha futhi ngabiza ngokuthi i-cluster simulator.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Inkinga yayiwukuthi inkosi endala ayikwazanga ukuba yi-replica evamile, okungukuthi i-Patroni yaqala, u-Patroni wabonisa ukuthi le node yayikhona njenge-replica, kodwa ngesikhathi esifanayo yayingeyona i-replica evamile. Manje uzobona ukuthi kungani. Yilokhu engikugcinile ekuhlaziyeni leyo nkinga.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi kwaqala kanjani konke? Iqale, njengasenkingeni edlule, ngamabhuleki ediski. Besinezibophezelo okwesibili, ezimbili.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kube nokunqamuka kokuxhumana, okungukuthi, amaklayenti aklebhukile.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Kwakukhona ukuvinjelwa kobunzima obuhlukahlukene.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi, ngokufanele, i-subsystem yediski ayiphenduli kakhulu.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi into engaqondakali kakhulu kimi isicelo sokuvala shaqa esifikile. I-Postgres inezindlela ezintathu zokuvala shaqa:

  • Kuhle uma silinda wonke amaklayenti ukuthi azinqamule ngokwawo.
  • Kuyashesha lapho siphoqa amaklayenti ukuthi anqamule ngoba sizovala.
  • Futhi ngokushesha. Kulokhu, i-immediate ayiwatsheli ngisho amaklayenti ukuthi avale, ivele ivale ngaphandle kwesixwayiso. Futhi kuwo wonke amaklayenti, isistimu yokusebenza isivele ithumela umlayezo we-RST (umlayezo we-TCP wokuthi uxhumano luphazamisekile futhi iklayenti alisenalutho elizolibamba).

Ubani othumele lesi sibonakaliso? Izinqubo zasemuva ze-Postgres azithumeli amasiginali anjalo komunye nomunye, okungukuthi lokhu ukubulala-9. Abathumeli izinto ezinjalo komunye nomunye, basabela kuphela ezintweni ezinjalo, okungukuthi lokhu ukuqalisa kabusha okuphuthumayo kwe-Postgres. Ithunyelwe ngubani angazi.

Ngibheke umyalo "wokugcina" futhi ngabona umuntu oyedwa naye ongene nathi kule seva, kodwa ngaba namahloni kakhulu ukubuza umbuzo. Mhlawumbe kwakuwukubulala -9. Ngangibona ukubulala -9 ezingodweni, ngoba I-Postgres ithi kuthathe ukubulala -9, kodwa angizange ngikubone ezingodweni.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Uma ngibheka phambili, ngabona ukuthi uPatroni akazange abhale kulogi isikhathi eside - imizuzwana engama-54. Futhi uma siqhathanisa izitembu zesikhathi ezimbili, ibingekho imilayezo cishe imizuzwana engama-54.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi ngalesi sikhathi kwakukhona i-autofile. UPatroni wenze umsebenzi omuhle lapha futhi. Umphathi wethu omdala ubengatholakali, kukhona okwenzeka kuye. Kwaqala nokukhethwa kwenkosi entsha. Konke kuhambe kahle lapha. I-pgsql01 yethu isibe umholi omusha.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Sine-replica esiphenduke umpetha. Futhi kukhona impendulo yesibili. Futhi kube nezinkinga nge-replica yesibili. Uzamile ukulungisa kabusha. Njengoba ngikuqonda, uzame ukushintsha i-recovery.conf, qala kabusha i-Postgres futhi ixhume kumphathi omusha. Ubhala imiyalezo njalo ngemizuzwana eyi-10 ayizamayo, kodwa akaphumeleli.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi phakathi nale mizamo, isignali yokuvala ngokushesha ifika kumphathi omdala. Inkosi iqalwa kabusha. Futhi ukululama kuyama ngoba inkosi endala iqala kabusha. Okusho ukuthi, i-replica ayikwazi ukuxhuma kuyo, ngoba ikwimodi yokuvala shaqa.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngesinye isikhathi, kwasebenza, kodwa ukuphindaphinda akuzange kuqale.

Ukuqagela kwami ​​nje ukuthi bekunekheli elidala eliyinhloko ku-recovery.conf. Futhi lapho inkosi entsha ivela, isifaniso sesibili sisazama ukuxhuma kumphathi omdala.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Lapho u-Patroni eqala emfanekisweni wesibili, i-node yaqala kodwa ayikwazanga ukuphindaphinda. Futhi kwakhiwa i-replication lag, eyayibukeka kanjena. Okusho ukuthi, wonke ama-node amathathu ayesendaweni, kodwa i-node yesibili yasala ngemuva.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngaso leso sikhathi, uma ubheka izingodo ezilotshiwe, wawubona ukuthi ukuphindaphinda kwakungeke kuqale ngenxa yokuthi izingodo zokuthengiselana zazihlukene. Futhi lawo malogi okwenziwayo anikezwa uchwepheshe, acaciswe ku-recovery.conf, awahambisani nendawo yethu yamanje.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi lapha ngenze iphutha. Kwadingeka ngize ngizobona ukuthi yini eyayiku-recovery.conf ukuze ngihlole umbono wami wokuthi sasixhuma kumphathi ongalungile. Kodwa-ke ngangimane ngibhekana nalokhu futhi akuzange kungifikele, noma ngabona ukuthi umfanekiso owawusalele emuva futhi kwakuzodingeka ugcwaliswe kabusha, okungukuthi, ngandlela-thile ngasebenza ngokunganaki. Bekuyijoyinti lami leli.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngemuva kwemizuzu engama-30, umphathi usevele ufikile, okungukuthi ngiqale kabusha uPatroni esifanekisweni. Sengivele ngakuqeda, ngacabanga ukuthi kuzomele kugcwaliswe kabusha. Futhi ngacabanga - ngizoqala kabusha uPatroni, mhlawumbe okuthile okuhle kuzovela. Ukubuyisela kuqalile. Futhi isisekelo saze savulwa, sase silungele ukwamukela ukuxhumana.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ukuphindaphinda kuqalile. Kodwa ngemva komzuzu wawa nephutha lokuthi izingodo zokuthengiselana azimfanele.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngicabange ukuthi ngizophinda ngiqale phansi. Ngiqale kabusha i-Patroni futhi, futhi angizange ngiqale kabusha i-Postgres, kodwa ngaqala kabusha i-Patroni ngethemba lokuthi izoqala ngomlingo i-database.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ukuphindaphinda kwaqala futhi, kodwa amamaki kulogi yokwenziwe ayehlukile, ayengafani nomzamo wokuqala wangaphambilini. Ukuphindaphinda kumisiwe futhi. Futhi umyalezo wawusuvele wehluke kancane. Futhi akuzange kungifundise kakhulu.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Bese kuthi-ke kimina - kuthiwani uma ngiqala kabusha i-Postgres, ngalesi sikhathi ngenza indawo yokuhlola kumphathi wamanje ukuze ngihambise iphuzu elikulogi lokuthenga phambili kancane ukuze ukululama kuqale komunye umzuzu? Futhi, besisenezitoko ze-WAL.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngiqale kabusha i-Patroni, ngenza izindawo zokuhlola ezimbalwa ku-master, amaphuzu ambalwa aqala kabusha kumfanekiso lapho ivulwa. Futhi kwasiza. Ngacabanga isikhathi eside ukuthi kungani isiza nokuthi isebenza kanjani. Futhi i-replica yaqala. Futhi ukuphindaphinda kwakungasadabuki.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Inkinga enjalo kimi ingenye yezingaqondakali kakhulu, engisazixaka ngokuthi kwenzekeni ngempela lapho.

Iyini imithelela lapha? I-Patroni ingasebenza njengokuhlosiwe futhi ngaphandle kwamaphutha. Kodwa ngesikhathi esifanayo, lesi akusona isiqinisekiso esingu-100% sokuthi konke kuhamba kahle nathi. I-replica ingase iqale, kodwa ingase ibe sesimweni sokusebenza kancane, futhi uhlelo lokusebenza alukwazi ukusebenza nesifaniso esinjalo, ngoba kuzoba nedatha endala.

Futhi ngemva kwefayili, udinga njalo ukuhlola ukuthi yonke into ihlelekile ngeqoqo, okungukuthi, kunenombolo edingekayo ye-replicas, akukho ukuphindaphinda kokuphindaphinda.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Futhi njengoba sidlula kulezi zinkinga, ngizokwenza izincomo. Ngizamile ukuwahlanganisa abe amaslayidi amabili. Mhlawumbe, zonke izindaba zingahlanganiswa zibe amaslayidi amabili futhi zilandwe kuphela.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Uma usebenzisa i-Patroni, kufanele ube nokuqapha. Kufanele uhlale wazi lapho kwenzeka i-autofileover, ngoba uma ungazi ukuthi ube ne-autofileover, awukwazi ukulawula iqoqo. Futhi lokho kubi.

Ngemva kwefayela ngalinye, kufanele sihlale sihlola iqoqo mathupha. Kudingeka siqinisekise ukuthi sihlala sinenombolo yakamuva ye-replicas, akukho ukulibaziseka kokuphindaphinda, awekho amaphutha kulogi ahlobene nokuphindaphinda kokusakaza, nge-Patroni, ngohlelo lwe-DCS.

I-Automation ingasebenza ngempumelelo, i-Patroni iyithuluzi elihle kakhulu. Ingasebenza, kodwa lokhu ngeke kulethe iqoqo endaweni oyifunayo. Futhi uma singazi ngakho, sizoba senkingeni.

Futhi uPatroni akayona inhlamvu yesiliva. Kusadingeka siqonde ukuthi i-Postgres isebenza kanjani, ukuthi ukuphindaphinda kusebenza kanjani nokuthi i-Patroni isebenza kanjani ne-Postgres, nokuthi ukuxhumana phakathi kwama-node kunikezwa kanjani. Lokhu kuyadingeka ukuze ukwazi ukulungisa izinkinga ngezandla zakho.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngibhekana kanjani nendaba yokuxilongwa? Kwenzeka ukuthi sisebenze namakhasimende ahlukene futhi akekho onesitaki se-ELK, futhi kufanele silungise izingodo ngokuvula ama-consoles angu-6 namathebhu angu-2. Kuthebhu eyodwa, lawa amalogi we-Patroni we-node ngayinye, kwenye ithebhu, lawa amalogi we-Consul, noma ama-Postgres uma kunesidingo. Kunzima kakhulu ukuxilonga lokhu.

Yiziphi izindlela engizithathile? Okokuqala, ngihlala ngibheka uma ifayela selifikile. Nakimi lokhu kuyisizinda samanzi. Ngibheka ukuthi kwenzekeni ngaphambi kokufayila, ngesikhathi sokufayela nangemva kwefayela. I-fileover inamamaki amabili: lesi yisikhathi sokuqala nesokugcina.

Okulandelayo, ngibheka izingodo zezehlakalo ngaphambi komfayili, owandulela lowo ofake ifayela, okungukuthi, ngibheka izizathu zokuthi kungani ifayela lenzekile.

Futhi lokhu kunikeza isithombe sokuqonda okwenzekile nokuthi yini engenziwa esikhathini esizayo ukuze izimo ezinjalo zingenzeki (futhi ngenxa yalokho, akukho filer).

Futhi sivame ukubheka kuphi? Ngiyabuka:

  • Okokuqala, ku-Patroni izingodo.
  • Okulandelayo, ngibheka izingodo ze-Postgres, noma izingodo ze-DCS, kuye ngokuthi yini etholakala ku-Patroni izingodo.
  • Futhi izingodo zesistimu nazo ngezinye izikhathi zinikeza ukuqonda ukuthi yini ebangele ifayili.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Ngizizwa kanjani ngoPatroni? Nginobudlelwano obuhle kakhulu noPatroni. Ngokubona kwami, lokhu kungcono kakhulu kunanamuhla. Ngiyazi eminye imikhiqizo eminingi. Lezi yi-Stlon, Repmgr, Pg_auto_failover, PAF. 4 amathuluzi. Ngiwazame wonke. UPatroni uyintandokazi yami.

Uma bengibuza: "Ingabe ngincoma uPatroni?". Ngizothi yebo, ngoba ngiyamthanda uPatroni. Futhi ngicabanga ukuthi ngafunda ukupheka.

Uma ufisa ukubona ukuthi yiziphi ezinye izinkinga ezikhona nge-Patroni ngaphandle kwezinkinga engizishilo, ungahlala ubheka ikhasi nezindaba ku-GitHub. Kunezindaba eziningi ezihlukene futhi kuxoxwa ngezindaba ezithokozisayo lapho. Futhi ngenxa yalokho, ezinye izimbungulu zethulwa futhi zaxazululwa, okungukuthi, lokhu ukufunda okuthakazelisayo.

Kunezindaba ezimnandi mayelana nabantu abazidubula onyaweni. Inolwazi kakhulu. Uyafunda futhi uqonde ukuthi akudingekile ukwenza kanjalo. Ngizikoke.

Futhi ngithanda ukubonga kakhulu ku-Zalando ngokuthuthukisa le phrojekthi, okungukuthi ku-Alexander Kukushkin no-Alexey Klyukin. U-Aleksey Klyukin ungomunye wababhali abasebenzisana naye, akasasebenzi kwaZalando, kodwa laba ngabantu ababili abaqale ukusebenza ngalo mkhiqizo.

Futhi ngicabanga ukuthi uPatroni uyinto epholile kakhulu. Ngiyajabula ukuthi ukhona, kuyathakazelisa naye. Futhi ngibonga kakhulu kubo bonke ababambiqhaza ababhale iziqephu kuPatroni. Ngethemba ukuthi u-Patroni uzokhula ngokwengeziwe, apholile futhi asebenze kahle ngokuya ngeminyaka. Isivele iyasebenza, kodwa ngithemba ukuthi izoba ngcono nakakhulu. Ngakho-ke, uma uhlela ukusebenzisa i-Patroni, ungesabi. Lesi yisisombululo esihle, singasetshenziswa futhi sisetshenziswe.

Yilokho kuphela. Uma unemibuzo, buza.

I-Patroni Failure Stories noma Indlela yokuphahlazeka iqoqo lakho le-PostgreSQL. U-Alexey Lesovsky

Imibuzo yakho

Siyabonga ngombiko! Uma ngemuva kwefayela usadinga ukubheka lapho ngokucophelela, kungani-ke sidinga ifayili ezenzakalelayo?

Ngoba yizinto ezintsha. Sinonyaka nje sinaye. Kungcono ukuphepha. Sifuna ukungena sibone ukuthi ngempela konke kuhambe ngendlela obekufanele. Leli izinga lokungathembani kwabantu abadala - kungcono ukuhlola kabili futhi ubone.

Isibonelo, saya ekuseni futhi sabheka, akunjalo?

Hhayi ekuseni, ngokuvamile sifunda mayelana nefayela elizenzakalelayo cishe ngokushesha. Sithola izaziso, siyabona ukuthi ifayela elizenzakalelayo lenzekile. Cishe ngokushesha siyobheka. Kodwa konke lokhu kuhlola kufanele kulethwe ezingeni lokuqapha. Uma ufinyelela ku-Patroni nge-REST API, kunomlando. Ngomlando ungabona izitembu zesikhathi lapho ifayela lenzekile. Ngokusekelwe kulokhu, ukuqapha kungenziwa. Ungabona umlando, zingaki izehlakalo ezazilapho. Uma sinemicimbi eminingi, kusho ukuthi ifayela elizenzakalelayo lenzekile. Ungahamba uzobona. Noma i-automation yethu yokuqapha ihlole ukuthi sinawo wonke ama-replicas endaweni, akukho ukugoba futhi konke kuhamba kahle.

Siyabonga!

Siyabonga kakhulu ngendaba enhle! Uma sihambise iqoqo le-DCS ndawana thize kude neqoqo le-Postgres, khona-ke leli qoqo lidinga ukuseviswa ngezikhathi ezithile? Yiziphi izinqubo ezingcono kakhulu ezinye izingcezu zeqoqo le-DCS okudingeka zivalwe, okuthile okufanele kwenziwe ngazo, njll.? Sisinda kanjani sonke lesi sakhiwo? Futhi uzenza kanjani lezi zinto?

Enkampanini eyodwa, kwakudingeka ukwenza i-matrix yezinkinga, kwenzekani uma enye yezingxenye noma izingxenye eziningana zihluleka. Ngokwale matrix, sidlula kuzo zonke izingxenye ngokulandelana kwazo futhi sakhe izimo uma kwenzeka ukwehluleka kwalezi zingxenye. Ngokufanelekile, esimweni ngasinye sokungaphumeleli, ungaba nohlelo lwesenzo sokubuyisela. Futhi endabeni ye-DCS, iza njengengxenye yengqalasizinda evamile. Nabaphathi bayayiphatha, sesithembele kubaphathi abaphethe kanye nekhono labo lokuyilungisa uma kwenzeka izingozi. Uma ingekho i-DCS nhlobo, khona-ke siyayisebenzisa, kodwa ngesikhathi esifanayo asiyiqapheli ngokukhethekile, ngoba asinasibopho sengqalasizinda, kodwa sinikeza izincomo mayelana nendlela nokuthi yini okufanele iqaphe.

Okusho ukuthi, ingabe ngiqonde kahle ukuthi ngidinga ukukhubaza i-Patroni, ngikhubaze isifaki sefayela, ngikhubaze yonke into ngaphambi kokwenza noma yini ngabasingathi?

Kuya ngokuthi mangaki ama-node esinawo kuqoqo le-DCS. Uma kunamanodi amaningi futhi uma sikhubaza eyodwa kuphela yamanodi (i-replica), iqoqo ligcina ikhoramu. Futhi uPatroni usasebenza. Futhi akukho lutho olwenziwayo. Uma sinemisebenzi eyinkimbinkimbi ethinta ama-node amaningi, ukungabikho okungalimaza ikhoramu, khona-ke - yebo, kungase kube nengqondo ukubeka i-Patroni ikhefu. Inomyalo ohambisanayo - i-patronictl pause, i-patronictl iqale kabusha. Simane sithi ukuma kancane bese i-autofiler ingasebenzi ngaleso sikhathi. Senza ukunakekelwa kuqoqo le-DCS, bese sithatha ikhefu futhi siqhubeke nokuphila.

Бпасибо большС

Siyabonga kakhulu ngombiko wakho! Ithimba lomkhiqizo lizizwa kanjani mayelana nokulahleka kwedatha?

Amaqembu omkhiqizo awanandaba, futhi abaholayo beqembu bakhathazekile.

Yiziphi iziqinisekiso ezikhona?

Iziqinisekiso zinzima kakhulu. U-Alexander Kukushkin unombiko othi "Indlela yokubala i-RPO ne-RTO", okungukuthi isikhathi sokubuyisela nokuthi singalahlekelwa idatha engakanani. Ngicabanga ukuthi sidinga ukuthola lawa maslayidi futhi siwafunde. Ngokukhumbula kwami, kunezinyathelo eziqondile zokuthi zibalwa kanjani lezi zinto. Mingaki imisebenzi esingalahlekelwa, ingakanani idatha esingalahlekelwa. Njengenketho, singasebenzisa ukuphindaphinda okuvumelanayo ezingeni le-Patroni, kodwa lokhu kuyinkemba esika nhlangothi zombili: kungenzeka sinokwethenjelwa kwedatha, noma silahlekelwa isivinini. Kukhona ukuphindaphinda okuvumelanayo, kodwa futhi akuqinisekisi ukuvikelwa okungu-100% ekulahlekeni kwedatha.

U-Alexey, ngiyabonga ngombiko omuhle! Noma yikuphi okuhlangenwe nakho ngokusebenzisa i-Patroni yokuvikelwa kwezinga elingu-zero? Okungukuthi, ngokuhambisana nokulinda okuhambisanayo? Lona umbuzo wokuqala. Nombuzo wesibili. Usebenzise izixazululo ezahlukene. Sisebenzise i-Repmgr, kodwa ngaphandle kwe-autofiler, futhi manje sihlela ukufaka i-autofiler. Futhi sibheka uPatroni njengesixazululo esihlukile. Yini ongayisho njengezinzuzo uma uqhathaniswa ne-Repmgr?

Umbuzo wokuqala wawumayelana nama-synchronous replicas. Akekho osebenzisa ukuphindaphinda okuvumelanayo lapha, ngoba wonke umuntu uyesaba (Amaklayenti amaningana aseyisebenzisa kakade, empeleni, awazange azibone izinkinga zokusebenza - Inothi likaSomlomo). Kodwa sizenzele umthetho wokuthi kufanele okungenani kube khona ama-node amathathu eqoqweni lokuphindaphinda elivumelanayo, ngoba uma sinama-node amabili futhi uma i-master noma i-replica ihluleka, khona-ke u-Patroni ushintsha le nodi kumodi ye-Standalone ukuze isicelo siqhubeke umsebenzi. Kulokhu, kunengozi yokulahleka kwedatha.

Mayelana nombuzo wesibili, sisebenzise i-Repmgr futhi sisayenza namanye amakhasimende ngenxa yezizathu zomlando. Yini engashiwo? I-Patroni iza ne-autofiler ngaphandle kwebhokisi, i-Repmgr iza ne-autofiler njengesici esengeziwe esidinga ukunikwa amandla. Sidinga ukusebenzisa i-daemon ye-Repmgr endaweni ngayinye bese singakwazi ukumisa i-autofiler.

I-Repmgr ihlola ukuthi ama-Postgres node ayaphila. Izinqubo ze-Repmgr zihlola ubukhona bomunye nomunye, lena akuyona indlela esebenza kahle kakhulu. kungase kube nezimo eziyinkimbinkimbi zokuhlukaniswa kwenethiwekhi lapho iqoqo elikhulu le-Repmgr lingahlukana libe amancane ambalwa futhi liqhubeke nokusebenza. Bengingayilandeli i-Repmgr isikhathi eside, mhlawumbe ilungisiwe ... noma mhlawumbe cha. Kodwa ukususwa kolwazi mayelana nesimo seqoqo ku-DCS, njengoba kwenza uStolon, uPatroni, kuyindlela esebenza kakhulu.

U-Alexey, nginombuzo, mhlawumbe okhubazekile. Kwesinye sezibonelo zokuqala, uhambise i-DCS emshinini wendawo wayisa kumsingathi oqhelile. Siyaqonda ukuthi inethiwekhi yinto enezici zayo, iphila yodwa. Futhi kwenzekani uma ngesizathu esithile iqoqo le-DCS lingatholakali? Ngeke ngisho izizathu, kungaba eziningi zazo: kusukela ezandleni ezigwegwile zabanenethiwekhi kuya ezinkingeni zangempela.

Angizange ngiyisho ngokuzwakalayo, kodwa iqoqo le-DCS kufanele futhi lihluleke, okungukuthi inombolo eyinqaba yamanodi, ukuze kuhlangatshezwane nekhoramu. Kwenzekani uma iqoqo le-DCS lingatholakali, noma ikhoramu ingahlangabezwana nayo, okungukuthi uhlobo oluthile lokuhlukaniswa kwenethiwekhi noma ukuhluleka kwamanodi? Kulokhu, iqoqo le-Patroni lingena kumodi yokufunda kuphela. Iqoqo le-Patroni alikwazi ukunquma isimo seqoqo nokuthi yini okufanele yenziwe. Ayikwazi ukuxhumana ne-DCS futhi igcine isimo seqoqo elisha lapho, ngakho-ke iqoqo lonke liya ekufundeni kuphela. Futhi ilinda ukungenelela okwenziwa ngesandla kusuka ku-opharetha noma i-DCS ukuthi ilulame.

Uma sikhuluma nje, i-DCS iba yisevisi yethu ebaluleke njengesisekelo ngokwaso?

Yebo Yebo. Ezinkampanini eziningi zesimanje, i-Service Discovery iyingxenye ebalulekile yengqalasizinda. Isetshenziswa ngisho nangaphambi kokuba kube nesizindalwazi engqalasizinda. Uma sikhuluma nje, ingqalasizinda yethulwa, yasetshenziswa e-DC, futhi ngokushesha sine-Service Discovery. Uma kuyi-Consul, khona-ke i-DNS ingakhiwa kuyo. Uma lokhu ku-Etcd, kungase kube nengxenye evela kuqoqo le-Kubernetes, lapho konke okunye kuzosetshenziswa khona. Kimina kubonakala sengathi i-Service Discovery isivele iyingxenye ebalulekile yengqalasizinda yesimanje. Futhi bacabanga ngakho kusenesikhathi kunangemininingwane yolwazi.

Siyabonga!

Source: www.habr.com

Engeza amazwana