Debugging network latency muKubernetes

Debugging network latency muKubernetes

Makore mashoma apfuura Kubernetes zvatokurukurwa pane yepamutemo GitHub blog. Kubva ipapo, yave yakajairika tekinoroji yekuendesa masevhisi. Kubernetes ikozvino inobata chikamu chakakosha chemukati neruzhinji masevhisi. Sezvo masumbu edu achikura uye zvinodiwa nekuita zvichiwedzera kuomarara, takatanga kuona kuti mamwe masevhisi paKubernetes aisangana nekunonoka izvo zvaisakwanisa kutsanangurwa nemutoro wekushandisa pachako.

Zvikurukuru, maapplication anoona zvisina tsarukano network latency inosvika 100ms kana kupfuura, zvichikonzera nguva yekubuda kana kuedzazve. Masevhisi aitarisirwa kukwanisa kupindura zvikumbiro nekukurumidza kupfuura 100ms. Asi izvi hazvigoneke kana iyo yekubatanidza pachayo ichitora nguva yakawanda kudaro. Takaparadzana, takaona nekukurumidza MySQL mibvunzo inofanirwa kutora milliseconds, uye MySQL yakapedza mumamilliseconds, asi kubva pakuona kwechikumbiro chekunyorera, mhinduro yakatora 100 ms kana kupfuura.

Zvakabva zvave pachena kuti dambudziko rakangoitika chete pakubatanidza kune Kubernetes node, kunyangwe kufona kwaibva kunze kweKubernetes. Nzira iri nyore yekuburitsa dambudziko iri mubvunzo Vegeta, iyo inomhanya kubva kune chero munhu wemukati, inoedza iyo Kubernetes sevhisi pane chaiyo chiteshi, uye inonyora zvishoma nezvishoma high latency. Muchikamu chino, tichatarisa kuti takakwanisa sei kutsvaga chikonzero chedambudziko iri.

Kubvisa kuoma kusingakoshi mumaketani kunotungamirira kukukundikana

Nekuburitsa muenzaniso mumwechete, isu taida kupfupisa tarisiro yedambudziko uye kubvisa zvisingakoshi zvidimbu zvekuoma. Pakutanga, pakanga paine zvinhu zvakawandisa mukuyerera pakati peVegeta neKubernetes pods. Kuti uone dambudziko rakadzama retiweki, unofanirwa kutonga kunze kwemamwe acho.

Debugging network latency muKubernetes

Mutengi (Vegeta) inogadzira chinongedzo cheTCP nechero node musumbu. Kubernetes inoshanda senge yakavharika network (pamusoro peiyo iripo data center network) inoshandisa IPIP, kureva kuti, inovhara IP packets ye network overlay mukati me IP mapaketi e data center. Kana uchibatanidza kune yekutanga node, kushandura kero yetiweki kunoitwa Network Kero Dudziro (NAT) yakasarudzika kushandura IP kero uye chiteshi cheKubernetes node kune IP kero uye chiteshi mune yakavharika network (kunyanya, iyo pod ine application). Kune mapaketi anouya, iyo reverse inoteedzana yezviito inoitwa. Iyo yakaoma sisitimu ine nyika yakawanda uye zvinhu zvakawanda zvinogara zvichigadziridzwa uye kuchinjwa sezvo masevhisi anoiswa nekufambiswa.

Zvinobatsira tcpdump muVegeta bvunzo pane kunonoka panguva yeTCP kubata ruoko (pakati peSYN neSYN-ACK). Kuti ubvise kusakosha uku kusingakoshi, unogona kushandisa hping3 ye "pings" iri nyore ine SYN mapaketi. Isu tinotarisa kana pane kunonoka mupakiti yekupindura, uye wozogadzirisa zvakare kubatana. Isu tinogona kusefa iyo data kuti ibatanidze mapaketi makuru kupfuura 100ms uye kuwana nzira iri nyore yekuburitsa dambudziko pane yeVegeta yakazara network layer 7 bvunzo. Heano Kubernetes node "pings" uchishandisa TCP SYN/SYN-ACK pane sevhisi "node port" (30927) panguva gumi neshanu, yakapepetwa nemhinduro dzinononoka:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms

Inogona pakarepo kuita kucherechedza kwekutanga. Tichitarisa kutevedzana kwenhamba uye nguva, zviri pachena kuti aya haasi enguva imwe chete kusangana. Kunonoka kunowanzo kuunganidza uye kunozogadziriswa.

Zvadaro, tinoda kuona kuti ndezvipi zvikamu zvingave zvakabatanidzwa mukuitika kwekusangana. Zvichida aya ndiwo mamwe emazana emitemo iptables muNAT? Kana kuti pane matambudziko neIPIP tunneling pane network? Imwe nzira yekutarisa izvi ndeyekuyedza nhanho yega yega system nekuibvisa. Chii chinoitika kana ukabvisa NAT uye firewall logic, uchisiya chete IPIP chikamu:

Debugging network latency muKubernetes

Neraki, Linux inoita kuti zvive nyore kuwana iyo IP overlay layer zvakananga kana muchina uri pane imwecheteyo network:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms

Tichitarisa nemigumisiro yacho, dambudziko richiripo! Izvi hazvibatanidzi iptables uye NAT. Saka dambudziko ndere TCP? Ngationei kuti yenguva dzose ICMP ping inoenda sei:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms

len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms

len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms

len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms

Migumisiro inoratidza kuti dambudziko harina kupera. Pamwe iyi iIPIP mugero? Ngatirerutse bvunzo zvakare:

Debugging network latency muKubernetes

Mapaketi ese anotumirwa pakati peaya mahosi maviri?

theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms

len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms

Isu takarerutsa mamiriro kune maviri Kubernetes node kutumira mumwe nemumwe chero pakiti, kunyangwe ICMP ping. Ivo vachiri kuona latency kana iyo yainotarirwa iri "yakaipa" (vamwe yakashata kupfuura vamwe).

Zvino mubvunzo wekupedzisira: nei kunonoka kuchingoitika pane kube-node maseva? Uye zvinoitika here kana kube-node ari mutumi kana mugamuchiri? Sezvineiwo, izvi zvakare zviri nyore kufunga nekutumira pakiti kubva kune muenzi kunze kweKubernetes, asi neayo "anozivikanwa akaipa" anogamuchira. Sezvauri kuona, dambudziko harina kupera:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms

Isu tinozomhanyisa zvikumbiro zvakafanana kubva kune yakapfuura sosi kube-node kune yekunze muenzi (iyo isingabatanidzi iyo sosi inotambira sezvo ping ichisanganisira zvese zviri zviviri RX uye TX chikamu):

theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms

Nekuongorora latency packet captures, takawana rumwe ruzivo. Kunyanya, kuti mutumiri (pazasi) anoona iyi nguva yekupera, asi anogamuchira (pamusoro) haaone - ona iyo Delta column (mumasekondi):

Debugging network latency muKubernetes

Mukuwedzera, kana iwe ukatarisa mutsauko mukurongeka kweTCP uye ICMP mapaketi (nenhamba dzakatevedzana) padivi rekugamuchira, ICMP mapaketi anogara achisvika mukutevedzana kwakafanana kwaakatumirwa, asi nenguva dzakasiyana. Panguva imwecheteyo, TCP mapaketi dzimwe nguva anopindirana, uye mamwe acho anonamira. Kunyanya, kana iwe ukaongorora zviteshi zveSYN mapaketi, ari kurongeka kudivi reanotumira, asi kwete kudivi reanogamuchira.

Pane musiyano wakajeka pakuti sei network makadhi maseva emazuva ano (seaya ari munzvimbo yedu yedata) anogadzirisa mapaketi ane TCP kana ICMP. Kana pakiti yasvika, iyo network adapta "hashes iyo pakubatanidza", ndiko kuti, inoedza kuputsa maratidziro mumitsetse uye kutumira mutsara wega wega kune yakaparadzana processor core. YeTCP, iyi hashi inosanganisira zvese kwainotangira uye kwainoenda IP kero uye chiteshi. Mune mamwe mazwi, kubatana kwega kwega kune hashi (zvichida) zvakasiyana. Kune ICMP, kero dzeIP chete dzinokurumidza, sezvo pasina zviteshi.

Chimwe chitarisiko chitsva: panguva ino tinoona kunonoka kweICMP pane zvese zvekutaurirana pakati pevatambi vaviri, asi TCP haina. Izvi zvinotiudza kuti chikonzero chingangove chine chekuita neRX queue hashing: iko kuungana kunenge kuri mukugadzirisa kweRX mapaketi, kwete mukutumira mhinduro.

Izvi zvinobvisa kutumira mapaketi kubva pane rondedzero yezvinokonzeresa. Isu tava kuziva kuti dambudziko rekugadzirisa pakiti riri padivi rekugamuchira pane mamwe ma-kube-node maseva.

Kunzwisisa packet processing muLinux kernel

Kuti unzwisise kuti sei dambudziko richiitika kune anogamuchira pane mamwe kube-node maseva, ngatitarisei kuti Linux kernel inobata sei mapaketi.

Kudzokera kune yakapusa tsika kuita, iyo network kadhi inogamuchira pakiti uye inotumira vhiringidza iyo Linux kernel kuti pane pasuru inoda kugadziriswa. Iyo kernel inomisa rimwe basa, inoshandura mamiriro kune anovhiringidza inobata, inogadzirisa pakiti, uye yozodzokera kumabasa azvino.

Debugging network latency muKubernetes

Iyi shanduko inononoka: latency inogona kunge isingaonekwe pa10Mbps network makadhi mu'90s, asi pamakadhi emazuva ano e10G ane huwandu hwemapaketi emamiriyoni gumi nesekondi, imwe neimwe yepakati serevha-yepakati serevha inogona kuvhiringwa mamirioni. yenguva pasekondi.

Kuti urege kugara uchibata zvinokanganisa, makore mazhinji apfuura Linux yakawedzera NAPI: Network API inoshandiswa nevatyairi vese vemazuva ano kuvandudza mashandiro nekumhanya kwakanyanya. Pakumhanya kwakaderera kernel ichiri kugamuchira zvinokanganisa kubva kunetiweki kadhi nenzira yekare. Kana mapaketi akakwana asvika anodarika chikumbaridzo, kernel inodzima kukanganisa uye panzvimbo pacho inotanga kuvhota network adapta uye kutora mapaketi muchunks. Kugadziriswa kunoitwa mu softirq, kureva, in mamiriro esoftware anovhiringidza mushure mekufona kwesystem uye Hardware kukanganisa, kana kernel (kusiyana nenzvimbo yemushandisi) yave kutomhanya.

Debugging network latency muKubernetes

Izvi zvinokurumidza, asi zvinokonzera dambudziko rakasiyana. Kana paine mapaketi akawandisa, saka nguva yese inopedzwa kugadzira mapaketi kubva kunetiweki kadhi, uye mushandisi nzvimbo maitiro haana nguva yekudurura iyi mitsetse (kuverenga kubva kuTCP kubatana, nezvimwewo). Pakupedzisira mitsetse yazara uye tinotanga kudonhedza mapaketi. Mukuedza kutsvaga chiyero, kernel inoisa bhajeti yehuwandu hwehuwandu hwemapaketi akagadziriswa mune softirq mamiriro. Kana bhajeti iyi yadarika, tambo yakaparadzana inomutswa ksoftirqd (uchaona mumwe wavo mukati ps per core) inobata idzi softirqs kunze kweiyo yakajairika syscall/kukanganisa nzira. Iyi tambo yakarongwa uchishandisa yakajairwa process scheduler, inoedza kugovera zviwanikwa zvakanaka.

Debugging network latency muKubernetes

Mushure mekudzidza kuti kernel inobata sei mapaketi, unogona kuona kuti pane imwe mukana wekusangana. Kana mafoni eSoftirq akagamuchirwa zvishoma kazhinji, mapaketi anozofanira kumirira imwe nguva kuti agadziriswe mumutsara weRX panetiweki kadhi. Izvi zvinogona kunge zviri nekuda kwerimwe basa rinovharira processor core, kana chimwe chinhu chiri kudzivirira musimboti kubva mukumhanya softirq.

Kuderedza kugadzirisa kusvika pakati kana nzira

Kunonoka kweSoftirq kungofungidzira izvozvi. Asi zvine musoro, uye tinoziva kuti tiri kuona chimwe chinhu chakafanana. Saka danho rinotevera nderekusimbisa dzidziso iyi. Uye kana yakasimbiswa, saka tsvaga chikonzero chekunonoka.

Ngatidzokere kumapaketi edu anononoka:

len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms

len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms

len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms

len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms

Sezvakakurukurwa pakutanga, aya mapaketi eICMP anomhanyiswa mune imwechete RX NIC mutsara uye anogadziriswa neiyo imwechete CPU musimboti. Kana isu tichida kunzwisisa kuti Linux inoshanda sei, zvinobatsira kuziva kupi (pane CPU musimboti) uye sei (softirq, ksoftirqd) mapakeji aya anogadziriswa kuitira kutevedzera maitiro.

Iye zvino yave nguva yekushandisa zvishandiso zvinokutendera kuti utarise iyo Linux kernel munguva chaiyo. Apa takashandisa Nyorera mumwe zvisingaoneke. Iyi seti yezvishandiso inobvumidza iwe kunyora zvidiki C zvirongwa zvinokochekera zvisina tsarukano mu kernel uye kuvharira zviitiko mushandisi-nzvimbo Python chirongwa chinogona kuzvigadzirisa uye kudzosera mhedzisiro kwauri. Kubata zvisina tsarukano mabasa mukernel inyaya yakaoma, asi iyo yekushandisa yakagadzirirwa kuchengetedza zvakanyanya uye yakagadzirirwa kuteedzera pasi chaizvo mhando yezvigadzirwa zvekugadzira izvo zvisiri nyore kuburitswa muyedzo kana budiriro nharaunda.

Chirongwa chiri pano chiri nyore: tinoziva kuti kernel inogadzira idzi ICMP pings, saka tichaisa hoko pane kernel basa. icmp_echo, iyo inogamuchira inouya ICMP echo chikumbiro pakiti uye inotanga kutumira ICMP echo mhinduro. Tinogona kuona pakiti nekuwedzera iyo icmp_seq nhamba, inoratidza hping3 yakakwirira.

kodhi bcc chinyorwa zvinotaridzika zvakaoma, asi hazvisi zvinotyisa sezvazvinoratidzika. Function icmp_echo conveys struct sk_buff *skb: Iri ipakiti ine "echo chikumbiro". Tinogona kuitevera, kuburitsa kutevedzana echo.sequence (iyo inofananidzwa ne icmp_seq ne hping3 Π²Ρ‹ΡˆΠ΅), uye utumire kunzvimbo yemushandisi. Izvo zvakare zviri nyore kutora yazvino maitiro zita/id. Pazasi pane mhedzisiro yatinoona zvakananga apo kernel inogadzira mapaketi:

TGID Pid Proction Zita ICMP_SEQ 0 0 SWAPERS / 11 770 0 0 11 771 0 0 11 772 0 0 11 773 0 0 11 774 20041 20086 775 0 0 11 776 0 0 11 777 0 0 11 778 4512 4542 swapper/779 XNUMX XNUMX XNUMX spokes-report-s XNUMX

Zvinofanira kucherechedzwa pano kuti mumamiriro ezvinhu softirq maitiro akaita masystem ekufona anozoonekwa se "maitirwo" asi chokwadi iri kernel inobata zvakachengeteka mapaketi mumamiriro eiyo kernel.

Nechishandiso ichi tinogona kubatanidza maitiro chaiwo nemapakeji chaiwo anoratidza kunonoka kwe hping3. Ngatiite nyore grep pakubata uku kune mamwe maitiro icmp_seq. Mapaketi anoenderana nepamusoro icmp_seq maitiro akacherechedzwa pamwe neRTT yavo yatakaona pamusoro (mumaparentheses ndiwo anotarisirwa kukosha eRTT yemapaketi atakasefa nekuda kweRTT tsika isingasviki makumi mashanu ms):

TGID PID PROCESS ZITA ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 11 ksoft 1954 ksoft 89 ir qd/76 76 ** 11ms 1955 79 ksoftirqd/ 76 76 ** 11ms 1956 69 ksoftirqd/76 76 ** 11ms 1957 59 ksoftirqd/76 76 ** (11ms) 1958 49 ksoftirqd/76 76 ** (11ms) 1959 39 76 ft (76ms) 11 1960 29 76 ft 76 11 k ksoft irqd/ 1961 19 ** (76ms) 76 11 ksoftirqd/1962 9 ** (10137ms) -- 10436 2068 cadvisor 10137 10436 2069 cadvisor 76 76 11 2070 75 ksoft 76 ksoft 76 irqd/11 2071 ** 65ms 76 76 ksoftirqd/ 11 2072 ** 55ms 76 76 ksoftirqd/11 2073 ** (45ms) 76 76 ksoftirqd/11 2074 ** (35ms) 76 76 ksoftirqd/11 2075 25 ** 76 76 11 ** 2076 15 ksoftirqd/ 76 76 ksoftirqd ms ) 11 2077 ksoftirqd/5 XNUMX ** (XNUMXms)

Migumisiro inotiudza zvinhu zvakawanda. Chekutanga, ese mapakeji aya anogadziriswa nemamiriro ezvinhu ksoftirqd/11. Izvi zvinoreva kuti kune iyi mimwe michina, ICMP mapaketi akamhanyiswa kusvika pakati 11 pakupera kwekugamuchira. Isu tinoona zvakare kuti pese paine jam, pane mapaketi anogadziriswa mukati meiyo system call. cadvisor. Ipapo ksoftirqd inotora basa uye inogadzirisa mutsara wakaunganidzirwa: chaizvo iyo nhamba yemapakiti akaunganidza mushure. cadvisor.

Ichokwadi chokuti pakarepo isati yagara inoshanda cadvisor, kunoreva kubatanidzwa kwake muchinetso chacho. Sezvineiwo, chinangwa cadvisor - "Ongorora mashandisirwo ezvishandiso uye maitiro ekuita kwemidziyo inomhanya" pane kukonzeresa iyi nyaya yekuita.

Sezvimwe nezvimwe zvinhu zvemidziyo, aya ese maturusi epamusoro uye anogona kutarisirwa kusangana nenyaya dzekuita mune mamwe mamiriro asingatarisirwe.

Chii chinoita cadvisor chinodzikisira mutsara wepaketi?

Isu tave nenzwisiso yakanaka yekuti tsaona inoitika sei, maitiro arikukonzeresa, uye pane CPU ipi. Isu tinoona kuti nekuda kwekuvhara kwakaoma, iyo Linux kernel haina nguva yekuronga ksoftirqd. Uye isu tinoona kuti mapaketi anogadziriswa mumamiriro ezvinhu cadvisor. Zvine musoro kufunga kuti cadvisor inotanga syscall inononoka, mushure mezvo mapaketi ese akaunganidzwa panguva iyoyo anogadziriswa:

Debugging network latency muKubernetes

Iyi idzidziso, asi kuti ungaiedza sei? Chatingaite kuteedzera iyo CPU musimboti mukati meiyi maitiro, tsvaga painoenda nhamba yemapakiti pamusoro pebhajeti uye ksoftirqd inodaidzwa, uye wozotarisa zvishoma kumashure kuti uone kuti chii chaizvo chaimhanya paCPU musimboti nguva iyoyo isati yasvika. . Zvakafanana ne x-raying iyo CPU yega yega milliseconds. Ichaita seizvi:

Debugging network latency muKubernetes

Zviri nyore, zvese izvi zvinogona kuitwa nemidziyo iripo. Semuyenzaniso, perf record inotarisa yakapihwa CPU musimboti pane yakatarwa frequency uye inogona kugadzira hurongwa hwekufona kune inomhanya sisitimu, kusanganisira ese ari maviri mushandisi nzvimbo uye Linux kernel. Iwe unogona kutora rekodhi iyi woigadzirisa uchishandisa diki forogo yechirongwa FlameGraph kubva kuna Brendan Gregg, iyo inochengetedza kurongeka kweiyo stack trace. Isu tinogona kuchengetedza mutsara-mutsetse wemateki ega ega 1 ms, tobva tasimbisa nekuchengetedza sampuli 100 milliseconds isati yasvika. ksoftirqd:

# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100

Heino mibairo:

(сотни слСдов, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ выглядят ΠΏΠΎΡ…ΠΎΠΆΠΈΠΌΠΈ)

cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run

Pane zvinhu zvakawanda pano, asi chinhu chikuru ndechekuti isu tinowana iyo "cadvisor pamberi ksoftirqd" pateni yatakaona pakutanga muICMP tracer. Zvinorevei?

Imwe neimwe mutsara ndeye CPU trace pane imwe nguva panguva. Imwe neimwe inodaidza pasi stack pamutsara inoparadzaniswa nesemicolon. Pakati pemitsetse tinoona syscall ichinzi: read(): .... ;do_syscall_64;sys_read; .... Saka cadvisor anoshandisa nguva yakawanda pane system call read()zvinoenderana nemabasa mem_cgroup_* (pamusoro pekufona stack / kupera kwemutsara).

Hazvina kunaka kuona mune yekufona kutsvaga kuti chii chaizvo chiri kuverengwa, saka ngatimhanye strace uye ngationei zvinoita cadvisor uye tiwane system inofona yakareba kupfuura 100 ms:

theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>

Sezvaungatarisira, tinoona mafoni anononoka pano read(). Kubva pane zviri mukati mekuverenga mashandiro uye mamiriro mem_cgroup zviri pachena kuti matambudziko aya read() tarisa kufaira memory.stat, iyo inoratidza kushandiswa kwendangariro uye miganhu yeboka (Docker's resource isolation tekinoroji). Iyo cadvisor chishandiso inobvunza iyi faira kuti iwane ruzivo rwemashandisirwo emidziyo. Ngatitarisei kana iri kernel kana cadvisor iri kuita chimwe chinhu chisingatarisirwi:

theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null

real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $

Iye zvino isu tinokwanisa kuburitsa iyo bug uye kunzwisisa kuti iyo Linux kernel yakatarisana nechirwere.

Sei kuverenga kuri kunonoka?

Panguva ino, zviri nyore kuwana mameseji kubva kune vamwe vashandisi nezvematambudziko akafanana. Sezvazvakazoitika, mune cadvisor tracker iyi bug yakanzi dambudziko rekushandisa zvakanyanya CPU, kungoti hapana akaona kuti latency inoratidzwawo zvisina tsarukano mune network stack. Zvechokwadi zvakaonekwa kuti cadvisor yaidya yakawanda CPU nguva kupfuura yaitarisirwa, asi izvi hazvina kupiwa kukosha kwakanyanya, sezvo maseva edu ane zvakawanda zveCPU zviwanikwa, saka dambudziko harina kunyatsodzidza.

Dambudziko nderekuti mapoka anotora mundangariro kushandiswa mukati mezita rezita (mudziyo). Kana ese maitirwo ari muboka iri abuda, Docker anoburitsa memory cgroup. Zvisinei, "memory" haisi kungogadzirisa ndangariro. Kunyangwe iyo memory process pachayo isingachashandiswe, zvinoita sekunge kernel ichiri kugovera zviri mukati, senge dentries uye inodes (dhairekitori uye faira metadata), iyo yakavharirwa mundangariro cgroup. Kubva pane tsananguro yedambudziko:

zombie cgroups: mapoka asina maitiro uye akabviswa, asi achine ndangariro dzakagoverwa (munyaya yangu, kubva kune dentry cache, asi inogona zvakare kugoverwa kubva kune peji cache kana tmpfs).

Cheki yekernel yemapeji ese ari mu cache kana uchisunungura cgroup inogona kunonoka, saka simbe inosarudzwa inosarudzwa: mirira kusvika mapeji aya akumbirwa zvakare, uye wozopedzisira wabvisa cgroup kana ndangariro ichinyatsodiwa. Kusvika panguva ino, cgroup ichiri kuverengerwa pakuunganidza manhamba.

Kubva pamaonero ekuita, vakapira ndangariro yekuita: kukurumidza kuchenesa kwekutanga nekusiya imwe cached memory kumashure. Izvi zvakanaka. Kana kernel ichishandisa yekupedzisira yecached memory, iyo cgroup inopedzisira yacheneswa, saka haigone kunzi "leak". Nehurombo, iyo chaiyo yekuitwa kweiyo yekutsvaga michina memory.stat mune iyi kernel vhezheni (4.9), yakasanganiswa nehuwandu hukuru hwendangariro pamaseva edu, zvinoreva kuti zvinotora nguva yakareba kudzoreredza yazvino cached data uye kujekesa cgroup zombies.

Zvinoitika kuti mamwe emanode edu aive neakawanda cgroup zombies zvekuti kuverenga uye latency yakapfuura sekondi.

Iyo workaround yedambudziko recadvisor ndeyekusunungura nekukasira dentries / inode caches mukati me system, iyo inobvisa nekukurumidza kuverenga latency pamwe netiweki latency pane iyo host, sezvo kubvisa cache kunovhura cached zombie cgroup mapeji uye kusunungura iwo zvakare. Iyi haisi mhinduro, asi inosimbisa chikonzero chedambudziko.

Zvakazoitika kuti mune nyowani kernel shanduro (4.19+) kufona kuita kwakagadziridzwa memory.stat, saka kuchinja kune iyi kernel kwakagadzirisa dambudziko. Panguva imwecheteyo, isu taive nemidziyo yekuona inonetsa node mumasumbu eKubernetes, toabvisa zvine hunyanzvi uye nekuatangazve. Isu takabatanidza masumbu ese, takawana ma node ane yakakwira yakakwana latency uye takaatangazve. Izvi zvakatipa nguva yekuvandudza OS pamaseva asara.

Summing up

Nekuti iyi bug yakamisa RX NIC mutsara kugadzirisa kwemazana emamilliseconds, yakakonzera panguva imwe chete iyo yakakwira latency papfupi yekubatanidza uye yepakati-yekubatanidza latency, senge pakati peMySQL zvikumbiro uye mhinduro mapaketi.

Kunzwisisa uye kuchengetedza kushanda kweakanyanya kukosha masisitimu, akadai saKubernetes, kwakakosha kune kuvimbika uye nekukurumidza kwese masevhisi akavakirwa pazviri. Yese sisitimu yaunomhanyisa inobatsira kubva kuKubernetes kuita kuvandudzwa.

Source: www.habr.com

Voeg