Teeb meem nrog DNS hauv Kubernetes. Public postmortem

Nco tseg txhais: Nov yog kev txhais lus ntawm pej xeem postmortem los ntawm lub tuam txhab engineering blog Xe. Nws piav qhia txog qhov teeb meem nrog conntrack hauv Kubernetes pawg, uas ua rau muaj kev poob qis ntawm qee qhov kev pabcuam tsim khoom.

Kab lus no yuav muaj txiaj ntsig zoo rau cov neeg uas xav kawm ntxiv me ntsis txog kev tuag tom qab lossis tiv thaiv qee qhov teeb meem DNS yav tom ntej.

Teeb meem nrog DNS hauv Kubernetes. Public postmortem
Qhov no tsis yog DNS
Nws tsis tuaj yeem yog DNS
Nws yog DNS

Ib me ntsis txog postmortems thiab cov txheej txheem hauv Preply

Lub postmortem piav qhia txog kev ua haujlwm tsis zoo lossis qee qhov xwm txheej hauv kev tsim khoom. Lub postmortem suav nrog lub sijhawm sijhawm ntawm cov xwm txheej, kev piav qhia ntawm cov neeg siv cuam tshuam, qhov laj thawj hauv paus, kev coj ua, thiab cov lus qhia tau kawm.

Nrhiav SRE

Ntawm cov rooj sib tham txhua lub limtiam nrog pizza, ntawm pab pawg kws tshaj lij, peb qhia ntau yam ntaub ntawv. Ib qho tseem ceeb tshaj plaws ntawm cov rooj sib tham no yog tom qab mortems, uas feem ntau yog nrog kev nthuav qhia nrog slides thiab kev soj ntsuam ntau qhov tob ntawm qhov xwm txheej. Txawm hais tias peb tsis npuaj teg tom qab kev tuag, peb sim txhim kho kab lis kev cai ntawm "tsis muaj txim" (tsis txhaum cai). Peb ntseeg hais tias kev sau ntawv thiab nthuav tawm cov ntawv tshaj tawm tuaj yeem pab peb (thiab lwm tus) tiv thaiv cov xwm txheej zoo sib xws yav tom ntej, uas yog vim li cas peb thiaj li faib lawv.

Cov tib neeg koom nrog hauv qhov xwm txheej yuav tsum xav tias lawv tuaj yeem hais tawm kom meej yam tsis muaj kev ntshai ntawm kev rau txim lossis kev ua pauj. Tsis muaj txim! Kev sau ntawv tom qab tsis yog kev rau txim, tab sis kev kawm rau tag nrho lub tuam txhab.

Khaws CALMS & DevOps: S yog rau Kev Sib Koom

Teeb meem nrog DNS hauv Kubernetes. Postmortem

Hnub tim: 28.02.2020

Tus sau phau ntawv: Amet U., Andrey S., Igor K., Alexey P.

Raws li txoj cai: Tiav lawm

Luv luv: Ib feem DNS tsis muaj (26 min) rau qee qhov kev pabcuam hauv Kubernetes pawg

Kev cuam tshuam: 15000 qhov xwm txheej poob rau cov kev pabcuam A, B thiab C

Keeb kwm: Kube-proxy tsis tuaj yeem raug tshem tawm qhov qub nkag los ntawm lub rooj conntrack, yog li qee qhov kev pabcuam tseem tab tom sim txuas mus rau cov pods uas tsis muaj nyob.

E0228 20:13:53.795782       1 proxier.go:610] Failed to delete kube-system/kube-dns:dns endpoint connections, error: error deleting conntrack entries for UDP peer {100.64.0.10, 100.110.33.231}, error: conntrack command returned: ...

Trigger: Vim tias qhov qis qis hauv Kubernetes pawg, CoreDNS-autoscaler txo tus naj npawb ntawm cov pods hauv kev xa tawm los ntawm peb mus rau ob.

tshuaj: Kev xa mus tom ntej ntawm daim ntawv thov pib tsim cov nodes tshiab, CoreDNS-autoscaler ntxiv cov pods ntxiv los ua haujlwm rau pawg, uas ua rau muaj kev rov sau dua ntawm cov lus conntrack.

Kev kuaj pom: Kev soj ntsuam Prometheus tau kuaj pom ntau qhov yuam kev 5xx rau cov kev pabcuam A, B thiab C thiab pib hu rau cov kws ua haujlwm hauv lub luag haujlwm

Teeb meem nrog DNS hauv Kubernetes. Public postmortem
5xx yuam kev hauv Kibana

Sau ntawv

nyhuv
Hom
Lub luag haujlwm
Hom phiaj

Disable autoscaler rau CoreDNS
tiv thaiv
Amet U.
TIAB SA-695

Teem lub caching DNS server
txo
Max V.
TIAB SA-665

Teeb tsa conntrack saib xyuas
tiv thaiv
Amet U.
TIAB SA-674

Cov Lus Qhia Kawm

Dab tsi ua tau zoo:

  • Kev saib xyuas ua haujlwm tau zoo. Cov lus teb tau nrawm thiab txhim kho
  • Peb tsis tau tsoo ib qho kev txwv ntawm cov nodes

Dab tsi tsis yog:

  • Tseem tsis paub tseeb hauv paus ua rau, zoo ib yam li kab mob tshwj xeeb hauv conntrack
  • Txhua qhov kev txiav txim tsuas yog qhov tshwm sim, tsis yog lub hauv paus ua rau ( kab laum )
  • Peb paub tias tsis ntev los sis tom qab ntawd peb yuav muaj teeb meem nrog DNS, tab sis peb tsis tau ua qhov tseem ceeb rau cov dej num

Qhov twg peb tau muaj hmoo:

  • Qhov kev xa tawm tom ntej no tau tshwm sim los ntawm CoreDNS-autoscaler, uas overwrote lub rooj conntrack
  • Cov kab no cuam tshuam tsuas yog qee qhov kev pabcuam

Teem sijhawm (EET)

ВрСмя
nyhuv

22:13
CoreDNS-autoscaler txo tus naj npawb ntawm cov pods ntawm peb mus rau ob

22:18
Cov kws ua haujlwm ntawm lub luag haujlwm pib tau txais kev hu xov tooj los ntawm kev saib xyuas

22:21
Cov kws ua haujlwm ntawm lub luag haujlwm tau pib tshawb nrhiav qhov ua rau ntawm qhov yuam kev.

22:39
Cov kws ua haujlwm ntawm lub luag haujlwm pib dov rov qab ib qho ntawm cov kev pabcuam tshiab kawg rau cov ntawv dhau los

22:40
5 xx qhov yuam kev tsis tshwm sim, qhov xwm txheej tau ruaj khov

  • Lub sij hawm mus nrhiav tau: 4 min
  • Sijhawm ua ntej kev nqis tes ua: 21 min
  • Lub sijhawm kho: 1 min

cov lus qhia ntxiv

Txhawm rau txo qis kev siv CPU, Linux ntsiav siv qee yam hu ua conntrack. Hauv luv luv, qhov no yog cov khoom siv hluav taws xob uas muaj cov npe ntawm NAT cov ntaub ntawv khaws cia hauv lub rooj tshwj xeeb. Thaum lub pob ntawv tom ntej tuaj txog ntawm tib lub pod mus rau tib lub pod raws li ua ntej, qhov kawg IP chaw nyob yuav tsis raug suav rov qab, tab sis yuav raug coj los ntawm lub rooj sib tham.
Teeb meem nrog DNS hauv Kubernetes. Public postmortem
Yuav ua li cas conntrack ua haujlwm

Cov txiaj ntsim tau los

Qhov no yog ib qho piv txwv ntawm ib qho ntawm peb cov postmortems nrog qee qhov kev sib txuas muaj txiaj ntsig. Tshwj xeeb hauv tsab xov xwm no, peb qhia cov ntaub ntawv uas yuav pab tau rau lwm lub tuam txhab. Yog vim li cas peb thiaj tsis ntshai ua yuam kev thiab yog vim li cas peb thiaj li ua ib qho ntawm peb cov postmortems rau pej xeem. Nov yog qee qhov nthuav dav rau pej xeem postmortems:

Tau qhov twg los: www.hab.com

Ntxiv ib saib