Iingxaki nge-DNS eKubernetes. Ukuhlolwa kwesidumbu esidlangalaleni

Phawula inguqulelo: Olu luguqulelo lwesidumbu sikawonke-wonke esivela kwibhlog yobunjineli yenkampani Lungiselela kwangaphambili. Ichaza ingxaki ngokudibana neqela le-Kubernetes, elikhokelele ekuthotyweni kancinci kwezinye iinkonzo zemveliso.

Eli nqaku linokuba luncedo kwabo bafuna ukufunda ngakumbi malunga ne-postmortems okanye ukuthintela ezinye iingxaki ze-DNS ezinokwenzeka kwixesha elizayo.

Iingxaki nge-DNS eKubernetes. Ukuhlolwa kwesidumbu esidlangalaleni
Le asiyo-DNS
Ayinakuba yi-DNS
Yayiyi-DNS

Kancinci malunga nokufa kunye neenkqubo kwi-Preply

I-postmortem ichaza ukungasebenzi kakuhle okanye isiganeko esithile kwimveliso. I-postmortem ibandakanya ixesha leziganeko, impembelelo yomsebenzisi, unobangela, amanyathelo athathiweyo, kunye nezifundo ezifundiweyo.

Ukufuna i-SRE

Kwiintlanganiso zeveki kunye nepizza, phakathi kweqela lezobugcisa, sabelana ngolwazi olwahlukeneyo. Enye yezona ndawo zibalulekileyo kwezo ntlanganiso kukufa, okuhlala kukhatshwa yintetho enezilayidi kunye nohlalutyo olunzulu lwesiganeko. Nangona singaqhwabi izandla emva kokuxilongwa kwesidumbu, sizama ukuphuhlisa inkcubeko "yokungabi natyala" (inkcubeko engenatyala). Sikholelwa ukuba ukubhala kunye nokwazisa i-postmortems kunokusinceda (kunye nabanye) ukuthintela iziganeko ezifanayo kwixesha elizayo, yiyo loo nto sisabelana ngazo.

Abantu ababandakanyekileyo kwisiganeko bafanele bavakalelwe kukuba banokuthetha ngokweenkcukacha ngaphandle kokoyika ukohlwaywa okanye ukohlwaywa. Akukho tyala! Ukubhala i-postmortem ayisosohlwayo, kodwa lithuba lokufunda kuyo yonke inkampani.

Gcina i-CALMS & DevOps: S yeyokwabelana

Iingxaki nge-DNS eKubernetes. Postmortem

Umhla: 28.02.2020

Ababhali: Amet U., Andrey S., Igor K., Alexey P.

Isimo: Igqitywe tu

Ngamafutshane: Ukungafumaneki kwe-DNS inxalenye (i-26 min) kwezinye iinkonzo kwiqela le-Kubernetes

Impembelelo: Ngama-15000 eziganeko ezilahlekileyo kwiinkonzo A, B kunye no-C

Unobangela weengcambu: I-Kube-proxy ayikwazanga ukususa ngokuchanekileyo ungeno oludala ukusuka kwitafile yecontrack, ngoko ke ezinye iinkonzo bezisazama ukuqhagamshela kwiipods ezingekhoyo.

E0228 20:13:53.795782       1 proxier.go:610] Failed to delete kube-system/kube-dns:dns endpoint connections, error: error deleting conntrack entries for UDP peer {100.64.0.10, 100.110.33.231}, error: conntrack command returned: ...

Qalisa: Ngenxa yomthwalo ophantsi ngaphakathi kweqela le-Kubernetes, i-CoreDNS-autoscaler yanciphisa inani leepods ekuhanjisweni ukusuka kwisithathu ukuya kwisibini.

Isixazululo: Ukusasazwa okulandelayo kwesicelo kuqalise ukudalwa kweendawo ezintsha, i-CoreDNS-autoscaler yongeza iipods ezininzi ukuze zisebenze iqela, elixhokonxa ukubhalwa kwakhona kwetafile ye-contrack.

Ukufunyanwa: Ukubeka iliso kwe-Prometheus kuchonge inani elikhulu leempazamo ze-5xx kwiinkonzo A, B kunye no-C kwaye yaqalisa umnxeba kwiinjineli ezisemsebenzini.

Iingxaki nge-DNS eKubernetes. Ukuhlolwa kwesidumbu esidlangalaleni
5xx iimpazamo e Kibana

Izenzo

Intshukumo
Uhlobo
Uxanduva
Injongo

Khubaza i-autoscaler ye-CoreDNS
kuthintelwe
Amet U.
I-DEVOPS-695

Cwangcisa iseva ye-DNS ye-caching
nciphisa
UMax V.
I-DEVOPS-665

Misela esweni contrack
kuthintelwe
Amet U.
I-DEVOPS-674

Izifundo Ezifundiweyo

Yintoni ehambe kakuhle:

  • Ukubeka iliso kusebenze kakuhle. Impendulo yayikhawuleza kwaye ihlelwe
  • Asizange sibethe nayiphi na imida kwiinodi

Yintoni ebingalunganga:

  • Ayaziwa unobangela wokwenyani, ufana ne bug ethile ekunqandeni
  • Zonke izenzo zilungisa kuphela iziphumo, hayi unobangela (bug)
  • Sasisazi ukuba kungekudala sinokuba neengxaki nge-DNS, kodwa asizange siyibeke phambili imisebenzi

Apho sibe nethamsanqa:

  • Ukusasazwa okulandelayo kuqaliswe yi-CoreDNS-autoscaler, ebhala ngaphezulu kwetafile yecontrack.
  • Le bug ichaphazele kuphela ezinye iinkonzo

Ixesha (EET)

Ixesha
Intshukumo

22:13
I-CoreDNS-autoscaler yanciphisa inani leepod ukusuka ezintathu ukuya ezimbini

22:18
Iinjineli ezisemsebenzini zaqala ukufumana iminxeba evela kwinkqubo yokubeka iliso

22:21
Iinjineli ezisemsebenzini zaqala ukufumanisa unobangela weempazamo.

22:39
Iinjineli ezisemsebenzini zaqala ukubuyisela umva enye yeenkonzo zamva nje kuguqulelo lwangaphambili

22:40
Iimpazamo ze-5xx zayeka ukubonakala, imeko izinzile

  • Ixesha lokubona: 4 min
  • Ixesha phambi kwesenzo: 21 min
  • Ixesha lokulungisa: 1 min

ulwazi olongezelelweyo

Ukunciphisa ukusetyenziswa kwe-CPU, i-Linux kernel isebenzisa into ebizwa ngokuba yi-contrack. Ngamafutshane, oku kuluncedo oluqulethe uluhlu lweerekhodi ze-NAT ezigcinwe kwitafile ekhethekileyo. Xa ipakethe elandelayo ifika kwi-pod efanayo kwi-pod efanayo njengangaphambili, idilesi ye-IP yokugqibela ayiyi kubalwa kwakhona, kodwa iya kuthathwa kwitafile ye-contrack.
Iingxaki nge-DNS eKubernetes. Ukuhlolwa kwesidumbu esidlangalaleni
Indlela i-contrack isebenza ngayo

Iziphumo

Lo ibingumzekelo womnye wethu wokufa kunye namakhonkco aluncedo. Ngokukodwa kweli nqaku, sabelana ngolwazi olunokuba luncedo kwezinye iinkampani. Yiyo loo nto singoyiki ukwenza iimpazamo kwaye yiyo loo nto senza enye ye-postmortem yethu esidlangalaleni. Nazi ezinye izinto ezinomdla zokufa koluntu:

umthombo: www.habr.com

Yongeza izimvo