Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Eminyakeni embalwa edlule uKubernetes osekuxoxiwe ngakho kubhulogi elisemthethweni le-GitHub. Kusukela lapho, isibe ubuchwepheshe obujwayelekile bokuthumela izinsiza. Manje u-Kubernetes uphethe ingxenye enkulu yezinsizakalo zangaphakathi nezasesidlangalaleni. Njengoba amaqoqo ethu akhula futhi izidingo zokusebenza ziba namandla kakhulu, saqala ukuqaphela ukuthi ezinye izinsiza ku-Kubernetes zaziba nokubambezeleka ngezikhathi ezithile okwakungakwazi ukuchazwa umthwalo wohlelo ngokwalo.

Empeleni, izinhlelo zokusebenza zithola ukubambezeleka kwenethiwekhi okungahleliwe okungafika ku-100ms noma ngaphezulu, okuholela ekuphelelweni kwesikhathi noma ukuzama futhi. Izinsizakalo bekulindeleke ukuthi zikwazi ukuphendula izicelo ngokushesha kakhulu kuno-100ms. Kodwa lokhu akunakwenzeka uma uxhumano ngokwalo luthatha isikhathi esiningi. Ngokwehlukana, siqaphele imibuzo eshesha kakhulu ye-MySQL okufanele ithathe ama-millisecond, futhi i-MySQL iqede ngama-millisecond, kodwa ngokombono wohlelo olucelayo, impendulo ithathe 100 ms noma ngaphezulu.

Ngokushesha kwacaca ukuthi inkinga yenzeka kuphela lapho uxhuma endaweni ye-Kubernetes, noma ngabe ucingo luvela ngaphandle kwe-Kubernetes. Indlela elula yokukhiqiza kabusha inkinga isesivivinyweni Vegeta, eqalisa kusuka kunoma yimuphi umsingathi wangaphakathi, ihlola isevisi ye-Kubernetes echwebeni elithile, futhi ibhalisa ngezikhathi ezithile ukubambezeleka okuphezulu. Kulesi sihloko, sizobheka ukuthi sakwazi kanjani ukulandelela imbangela yale nkinga.

Ukuqeda ubunkimbinkimbi obungadingekile kuketango okuholela ekuhlulekeni

Ngokukhiqiza kabusha isibonelo esifanayo, besifuna ukunciphisa ukugxila kwenkinga futhi sisuse izendlalelo ezingadingekile zokuyinkimbinkimbi. Ekuqaleni, bekunezakhi eziningi kakhulu ekugelezeni phakathi kwe-Vegeta nama-Kubernetes pods. Ukuhlonza inkinga yenethiwekhi ejulile, udinga ukukhipha ezinye zazo.

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Iklayenti (i-Vegeta) idala uxhumano lwe-TCP nanoma iyiphi i-node kuqoqo. I-Kubernetes isebenza njengenethiwekhi eyimbondela (phezu kwenethiwekhi yesikhungo sedatha ekhona) esebenzisa IPIP, okungukuthi, ihlanganisa amaphakethe we-IP wenethiwekhi embondelayo ngaphakathi kwamaphakethe e-IP esikhungo sedatha. Lapho uxhuma endaweni yokuqala, ukuhumusha ikheli lenethiwekhi kuyenziwa Ukuhumusha kwekheli lenethiwekhi (I-NAT) ehloniphekile yokuhumusha ikheli le-IP kanye nembobo yenodi ye-Kubernetes ekhelini le-IP kanye nembobo yenethiwekhi yembondela (ikakhulukazi, i-pod enohlelo lokusebenza). Kumaphakethe angenayo, ukulandelana okuphambene kwezenzo kuyenziwa. Kuwuhlelo oluyinkimbinkimbi olunombuso omningi nezinto eziningi ezivuselelwa njalo futhi zishintshwe njengoba izinsizakalo zitshalwa futhi zihanjiswa.

Okusetshenziswayo tcpdump ekuhlolweni kwe-Vegeta kukhona ukubambezeleka ngesikhathi sokuxhawula kwe-TCP (phakathi kwe-SYN ne-SYN-ACK). Ukuze ususe lokhu kuyinkimbinkimbi okungadingekile, ungasebenzisa hping3 "ama-pings" alula anamaphakethe e-SYN. Sihlola ukuthi ingabe kukhona ukulibaziseka ephaketheni lokuphendula, bese sisetha kabusha uxhumano. Singahlunga idatha ukuze sifake amaphakethe angaphezu kuka-100ms kuphela futhi sithole indlela elula yokukhiqiza kabusha inkinga kunokuhlola kwenethiwekhi ye-Vegeta kwesendlalelo sesi-7 esigcwele. Nawa ama-"pings" we-Kubernetes node esebenzisa i-TCP SYN/SYN-ACK kusevisi ye-"node port" (30927) ngezikhathi ezingu-10ms, ehlungwa ngezimpendulo ezihamba kancane:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms

Ungakwazi ngokushesha ukwenza observation lokuqala. Uma sibheka izinombolo ezilandelanayo kanye nezikhathi, kuyacaca ukuthi lokhu akukona ukuminyana okwenzeka kanye. Ukubambezeleka kuvame ukunqwabelana futhi ekugcineni kucutshungulwe.

Okulandelayo, sifuna ukuthola ukuthi yiziphi izingxenye ezingase zibandakanyeke ekuveleni kokuminyana. Mhlawumbe lena eminye yemithetho engamakhulu e-iptables ku-NAT? Noma ingabe zikhona izinkinga ngokushuna kwe-IPIP kunethiwekhi? Enye indlela yokuhlola lokhu ukuhlola isinyathelo ngasinye sesistimu ngokusisusa. Kwenzekani uma ususa i-NAT kanye ne-firewall logic, ushiya kuphela ingxenye ye-IPIP:

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Ngenhlanhla, i-Linux yenza kube lula ukufinyelela isendlalelo sokumbondelana se-IP ngokuqondile uma umshini ukunethiwekhi efanayo:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms

Uma sibheka imiphumela, inkinga isekhona! Lokhu akufaki ama-iptables kanye ne-NAT. Ngakho-ke inkinga i-TCP? Ake sibone ukuthi i-ping ye-ICMP evamile ihamba kanjani:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms

len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms

len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms

len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms

Imiphumela ikhombisa ukuthi inkinga ayikasuki. Mhlawumbe lona umhubhe we-IPIP? Asenze lula ukuhlola ngokuqhubekayo:

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Ingabe wonke amaphakethe athunyelwa phakathi kwalaba basingathi ababili?

theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms

len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms

Senze isimo saba lula kumanodi amabili e-Kubernetes athumelana noma yiliphi iphakethe, ngisho ne-ICMP ping. Basabona ukubambezeleka uma usokhaya okuqondiwe "emubi" (ezinye zimbi kakhulu kunabanye).

Manje umbuzo wokugcina: kungani ukubambezeleka kwenzeka kuphela kumaseva e-kube-node? Futhi ingabe kwenzeka uma i-kube-node ingumthumeli noma umamukeli? Ngenhlanhla, lokhu futhi kulula kakhulu ukukuthola ngokuthumela iphakethe elivela kumsingathi ongaphandle kwe-Kubernetes, kodwa nomamukeli ofanayo “omubi” ofanayo. Njengoba ubona, inkinga ayizange inyamalale:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms

Sizobe sesisebenzisa izicelo ezifanayo kusukela kumthombo wangaphambilini we-kube-node ukuya kumsingathi wangaphandle (ongabandakanyi umsingathi womthombo njengoba i-ping ihlanganisa kokubili ingxenye ye-RX ne-TX):

theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms

Ngokuhlola ukuthwebula kwephakethe le-latency, sithole ulwazi olwengeziwe. Ngokucacile, ukuthi umthumeli (phansi) ubona lesi sikhathi sokuvala, kodwa umamukeli (phezulu) akakuboni - bona ikholomu ye-Delta (ngemizuzwana):

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Ngaphezu kwalokho, uma ubheka umehluko ekuhleleni kwamaphakethe e-TCP ne-ICMP (ngezinombolo zokulandelana) ohlangothini lomamukeli, amaphakethe e-ICMP ahlala efika ngokulandelana okufanayo lapho athunyelwe khona, kodwa ngesikhathi esihlukile. Ngesikhathi esifanayo, amaphakethe e-TCP ngezinye izikhathi ayahlukana, futhi amanye awo abambeke. Ikakhulukazi, uma uhlola izimbobo zamaphakethe e-SYN, ahlelekile ngasohlangothini lomthumeli, kodwa hhayi ngasohlangothini lomamukeli.

Kunomehluko ocashile endleleni amakhadi enethiwekhi amaseva esimanje (njengalawo asesikhungweni sethu sedatha) acubungula amaphakethe aqukethe i-TCP noma i-ICMP. Lapho iphakethe lifika, i-adaptha yenethiwekhi "iyisheshisa ngokuxhumeka ngakunye", okungukuthi, izama ukuphula ukuxhumeka emigqeni futhi ithumele umugqa ngamunye kumongo weprosesa ohlukile. Ku-TCP, le hashi ihlanganisa kokubili ikheli le-IP eliwumthombo nendawo okuyiwa kuyo kanye nembobo. Ngamanye amazwi, ukuxhumana ngakunye kusheshisiwe (okungenzeka) ngokuhlukile. Ku-ICMP, amakheli e-IP kuphela asheshisiwe, njengoba awekho amachweba.

Okunye ukuqaphela okusha: ngalesi sikhathi sibona ukubambezeleka kwe-ICMP kukho konke ukuxhumana phakathi kwabasingathi ababili, kodwa i-TCP ayikwenzi lokho. Lokhu kusitshela ukuthi imbangela kungenzeka ihlobene ne-RX hashing yomugqa: ukuminyana cishe impela kusekucutshungulweni kwamaphakethe e-RX, hhayi ekuthumeleni izimpendulo.

Lokhu kuqeda ukuthumela amaphakethe ohlwini lwezimbangela ezingase zibe khona. Manje sesiyazi ukuthi inkinga yokucubungula iphakethe isohlangothini lokwamukela kwamanye amaseva e-kube-node.

Ukuqonda ukucutshungulwa kwephakethe ku-Linux kernel

Ukuze uqonde ukuthi kungani inkinga yenzeka kumamukeli kwamanye amaseva e-kube-node, ake sibheke ukuthi i-Linux kernel iwaqhuba kanjani amaphakethe.

Ibuyela ekusetshenzisweni kwendabuko okulula, ikhadi lenethiwekhi lithola iphakethe bese liyathumela phazamisa i-Linux kernel ukuthi kunephakheji elidinga ukucutshungulwa. I-kernel imisa omunye umsebenzi, ishintsha umongo iye kusibambi esiphazamisayo, icubungule iphakethe, bese ibuyela emisebenzini yamanje.

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Lokhu kushintsha kokuqukethwe kuhamba kancane: ukubambezeleka kungenzeka kwakungabonakali kumakhadi enethiwekhi angu-10Mbps ngawo-'90s, kodwa emakhadini esimanje e-10G anomthamo omkhulu wamaphakethe ayizigidi ezingu-15 ngomzuzwana, umongo ngamunye weseva encane eyisishiyagalombili ingaphazanyiswa izigidi. izikhathi ngomzuzwana.

Ukuze ungahlali ubamba iziphazamiso, eminyakeni eminingi edlule i-Linux yengeze I-NAPI: Inethiwekhi ye-API esetshenziswa yibo bonke abashayeli besimanje ukuthuthukisa ukusebenza ngesivinini esiphezulu. Ngesivinini esiphansi i-kernel isathola iziphazamiso ekhadini lenethiwekhi ngendlela endala. Uma amaphakethe anele efika adlula umkhawulo, i-kernel ivala ukuphazamisa futhi esikhundleni salokho iqala ukuvotela i-adaptha yenethiwekhi futhi icoshe amaphakethe ngezingcezu. Ukucubungula kwenziwa ku-softirq, okungukuthi, ku- umongo wokuphazamiseka kwesoftware ngemuva kwezingcingo zesistimu nokuphazamiseka kwehadiwe, lapho i-kernel (ngokungafani nesikhala somsebenzisi) isivele isebenza.

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Lokhu kuyashesha kakhulu, kodwa kubangela inkinga ehlukile. Uma kunamaphakethe amaningi, khona-ke sonke isikhathi sichithwa ukucubungula amaphakethe asuka ekhadini lenethiwekhi, futhi izinqubo zesikhala somsebenzisi azinaso isikhathi sokuthulula ngempela le migqa (ukufunda kusuka ekuxhumekeni kwe-TCP, njll.). Ekugcineni imigqa iyagcwala futhi siqala ukulahla amaphakethe. Emzamweni wokuthola ibhalansi, i-kernel ibeka isabelomali senombolo enkulu yamaphakethe acutshungulwe kumongo we-softirq. Uma lesi sabelomali seqiwe, kuvuka uchungechunge oluhlukile ksoftirqd (uzobona omunye wabo phakathi ps per core) ephatha lawa ma-softirqs ngaphandle kwendlela evamile ye-syscall/yokuphazamisa. Lolu chungechunge luhlelwe kusetshenziswa isihleli senqubo esijwayelekile, esizama ukwaba izinsiza ngendlela efanele.

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Ngemva kokufunda ukuthi i-kernel iwacubungula kanjani amaphakethe, ungabona ukuthi kunethuba elithile lokuminyana. Uma izingcingo ze-softirq zitholwa kancane njalo, amaphakethe kuzodingeka alinde isikhathi esithile ukuze acutshungulwe kulayini we-RX ekhadini lenethiwekhi. Lokhu kungase kube ngenxa yomsebenzi othile ovimbela umgogodla wokucubungula, noma okunye okuvimbela umgogodla ekusebenziseni i-softirq.

Ukunciphisa ukucutshungulwa kuze kufike kumongo noma indlela

Ukubambezeleka kweSoftirq kuwukuqagela nje okwamanje. Kodwa kunengqondo, futhi siyazi ukuthi sibona into efanayo kakhulu. Ngakho isinyathelo esilandelayo siwukuqinisekisa lo mbono. Futhi uma kuqinisekiswa, khona-ke thola isizathu sokubambezeleka.

Masibuyele emaphaketheni ethu anensa:

len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms

len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms

len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms

len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms

Njengoba kuxoxiwe ngaphambili, lawa maphakethe e-ICMP asheshiselwa kumugqa owodwa we-RX NIC futhi acutshungulwe ngomongo owodwa we-CPU. Uma sifuna ukuqonda ukuthi i-Linux isebenza kanjani, kuyasiza ukwazi ukuthi (kuwuphi umnyombo we-CPU) nokuthi (softirq, ksoftirqd) la maphakheji acutshungulwa kanjani ukuze kulandelelwe inqubo.

Manje sekuyisikhathi sokusebenzisa amathuluzi akuvumela ukuthi ugade i-Linux kernel ngesikhathi sangempela. Lapha sasebenzisa bcc. Leli sethi lamathuluzi likuvumela ukuthi ubhale izinhlelo ezincane ze-C ezixhuma imisebenzi engafanele ku-kernel futhi ubeke imicimbi kuhlelo lwePython yesikhala somsebenzisi olungalucubungula futhi lubuyisele umphumela kuwe. Amahhuku emisebenzi engaqondile ku-kernel ayindaba eyinkimbinkimbi, kodwa insiza yakhelwe ukuphepha okuphezulu futhi idizayinelwe ukulandelela ngokuthe ngqo uhlobo lwezinkinga zokukhiqiza ezingakhiqizwe kabusha kalula endaweni yokuhlola noma yokuthuthukiswa.

Uhlelo lapha lulula: siyazi ukuthi i-kernel icubungula lawa ma-pings e-ICMP, ngakho-ke sizofaka ihhuku emsebenzini we-kernel. icmp_echo, eyamukela iphakethe lesicelo se-echo ye-ICMP engenayo futhi iqalise ukuthumela impendulo ye-ICMP echo. Singakwazi ukuhlonza iphakethe ngokwandisa inombolo ye-icmp_seq, ebonisayo hping3 ngaphezulu.

Ikhodi bcc iskripthi kubukeka kuyinkimbinkimbi, kodwa akwethusi njengoba kubonakala. Umsebenzi icmp_echo idlulisa struct sk_buff *skb: Leli iphakethe eline "isicelo se-echo". Singakwazi ukuyilandela, sikhiphe ukulandelana echo.sequence (okuqhathanisa ne icmp_seq ngo hping3 выше), bese uyithumela esikhaleni somsebenzisi. Kuyafaneleka futhi ukuthwebula igama/i-id yenqubo yamanje. Ngezansi imiphumela esiyibona ngqo ngenkathi i-kernel icubungula amaphakethe:

I-TGID PID inqubo yenqubo Igama le-ICMP_SEQ 0 0 swapper / 11 770 0 0 swapper / 11 771 0 0 SwapPer / 11 772 0 0 SwapPurce / 11 773 0 0 SwapPer / 11 774 20041 20086 SwapPer / 775 0 0 11 swapper / 776 0 0 11 777 0 0 11 778 swapper/4512 4542 779 XNUMX spokes-report-s XNUMX

Kufanele kuqashelwe lapha ukuthi umongo softirq izinqubo ezenze amakholi esistimu zizovela "njengezinqubo" kuyilapho empeleni kuyi-kernel ecubungula amaphakethe ngokuphephile kumongo we-kernel.

Ngaleli thuluzi singahlobanisa izinqubo ezithile namaphakheji athile abonisa ukubambezeleka kwe hping3. Ake sikwenze kube lula grep kulokhu kuthwebula amanani athile icmp_seq. Amaphakethe afana namanani angenhla e-icmp_seq aye aphawulwa kanye ne-RTT yawo esiyibone ngenhla (kubakaki amanani e-RTT alindelwe amaphakethe esiwahlunge ngenxa yamanani e-RTT angaphansi kuka-50 ms):

TGID PID INQUBO IGAMA ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 11 ksoft 1954 ksoft 89 ir qd/76 76 ** 11ms 1955 79 ksoftirqd/ 76 76 ** 11ms 1956 69 ksoftirqd/76 76 ** 11ms 1957 59 ksoftirqd/76 76 ** (11ms) 1958 49 ksoftirqd/76 76 ** (11ms) 1959 39irqd 76 k ksoft irqd/ 76 11 ** (1960ms) 29 76 ksoftirqd/76 11 ** (1961ms) -- 19 76 cadvisor 76 11 1962 cadvisor 9 10137 10436 2068 10137 ksoft 10436 ksoft 2069 irqd/76 76 ** 11ms 2070 75 ksoftirqd/ 76 76 ** 11ms 2071 65 ksoftirqd/76 76 ** (11ms) 2072 55 ksoftirqd/76 76 ** (11ms) 2073 45 ksoftirqd/76 76 11 ** 2074 35 ksoftirqd/ 76 76 ksoftirqd 11 ** 2075 25 (76ms) ms ) 76 11 ksoftirqd/2076 15 ** (76ms)

Imiphumela isitshela izinto ezimbalwa. Okokuqala, wonke lawa maphakheji acutshungulwa yingqikithi ksoftirqd/11. Lokhu kusho ukuthi kulo mbhangqwana wemishini, amaphakethe e-ICMP aphuthunyiswe ku-core 11 ekugcineni kokwamukela. Siphinde sibone ukuthi noma nini lapho kuba ne-jam, kukhona amaphakethe acutshungulwa kumongo wocingo lwesistimu. cadvisor. Ngemuva kwalokho ksoftirqd ithatha umsebenzi futhi icubungule ulayini onqwabelene: inani kanye nenani lamaphakethe anqwabelene ngemva kwalokho. cadvisor.

Iqiniso lokuthi ngokushesha ngaphambi lisebenza njalo cadvisor, kusho ukuhileleka kwakhe enkingeni. Okuxakayo ukuthi inhloso i-cadvisor - "hlaziya ukusetshenziswa kwensiza nezici zokusebenza kweziqukathi ezigijimayo" kunokuba ubangele le nkinga yokusebenza.

Njengezinye izici zeziqukathi, lawa wonke angamathuluzi athuthuke kakhulu futhi kungalindeleka ukuthi ahlangabezane nezinkinga zokusebenza ngaphansi kwezimo ezithile ezingalindelekile.

I-cadvisor yenzani ebambezela ulayini wephakethe?

Manje sesinokuqonda okuhle kakhulu kokuthi kwenzeka kanjani ukuphahlazeka, ukuthi iyiphi inqubo ebangela lokho, nokuthi iyiphi i-CPU. Siyabona ukuthi ngenxa yokuvinjwa kanzima, i-Linux kernel ayinaso isikhathi sokuhlela ksoftirqd. Futhi siyabona ukuthi amaphakethe acutshungulwa ngokomongo cadvisor. Kunengqondo ukucabanga ukuthi cadvisor yethula i-syscall ehamba kancane, ngemuva kwalokho wonke amaphakethe aqoqwe ngaleso sikhathi ayacutshungulwa:

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Lena ithiyori, kodwa ungayihlola kanjani? Esingakwenza ukulandelela umnyombo we-CPU kuyo yonke le nqubo, sithole iphuzu lapho inani lamaphakethe lidlula isabelomali bese kuthiwa i-ksoftirqd, bese sibheka emuva kancane ukuze sibone ukuthi yini ngempela ebisebenza kumongo we-CPU ngaphambi kwalelo phuzu. . Kufana ne-x-raying i-CPU njalo ngama-millisecond ambalwa. Kuzobukeka kanjena:

Isusa iphutha ukubambezeleka kwenethiwekhi ku-Kubernetes

Kalula, konke lokhu kungenziwa ngamathuluzi akhona. Ngokwesibonelo, irekhodi perf ihlola umgogodla onikeziwe we-CPU ngemvamisa ethile futhi ingakhiqiza ishejuli yamakholi kusistimu esebenzayo, okuhlanganisa kokubili isikhala somsebenzisi kanye ne-Linux kernel. Ungathatha leli rekhodi futhi ulicubungule usebenzisa imfoloko encane yohlelo I-FlameGraph kusuka ku-Brendan Gregg, ogcina ukuhleleka kokulandelela isitaki. Singakwazi ukulondoloza ukulandelelwa kwesitaki somugqa owodwa njalo ngo-1 ms, bese sigqamisa futhi silondoloze isampula yama-millisecond angu-100 ngaphambi kokulandela umkhondo. ksoftirqd:

# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100

Nansi imiphumela:

(сотни следов, которые выглядят похожими)

cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run

Kunezinto eziningi lapha, kodwa okubalulekile ukuthi sithola iphethini "ye-cadvisor ngaphambi kwe-ksoftirqd" esiyibone ngaphambili kumkhondo we-ICMP. Kusho ukuthini?

Umugqa ngamunye uwumkhondo we-CPU ngesikhathi esithile. Ukushayela ngakunye phansi isitaki kulayini kuhlukaniswa isemikholoni. Maphakathi nemigqa sibona i-syscall ibizwa ngokuthi: read(): .... ;do_syscall_64;sys_read; .... Ngakho-ke i-cadvisor ichitha isikhathi esiningi ocingweni lwesistimu read()ezihlobene nemisebenzi mem_cgroup_* (phezulu kwesitaki sekholi/ukuphela komugqa).

Kuyaphazamisa ukubona kumkhondo wekholi ukuthi yini ngempela efundwayo, ngakho-ke asigijime strace futhi ake sibone ukuthi i-cadvisor yenzani futhi sithole izingcingo zesistimu ezinde kuno-100 ms:

theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>

Njengoba ungalindela, sibona amakholi ahamba kancane lapha read(). Kusukela kokuqukethwe kokufunda ukusebenza nomongo mem_cgroup kuyacaca ukuthi lezi zinselelo read() bhekisa kufayela memory.stat, okubonisa ukusetshenziswa kwenkumbulo kanye nemikhawulo yeqembu (ubuchwepheshe bokuhlukanisa insiza ye-Docker). Ithuluzi le-cadvisor libuza leli fayela ukuze lithole ulwazi lokusetshenziswa kwensiza yeziqukathi. Ake sihlole ukuthi i-kernel noma i-cadvisor yenza okuthile okungalindelekile:

theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null

real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $

Manje sesingakwazi ukukhiqiza kabusha iphutha futhi siqonde ukuthi i-Linux kernel ibhekene ne-pathology.

Kungani umsebenzi wokufunda uhamba kancane?

Kulesi sigaba, kulula kakhulu ukuthola imilayezo evela kwabanye abasebenzisi mayelana nezinkinga ezifanayo. Njengoba kwenzeka, ku-tracker ye-cadvisor lesi siphazamisi sibikwe njenge inkinga yokusetshenziswa ngokweqile kwe-CPU, ukuthi nje akekho oqaphele ukuthi ukubambezeleka kubuye kubonakale ngokungahleliwe kusitaki senethiwekhi. Ngempela kwaqashelwa ukuthi i-cadvisor idla isikhathi esiningi se-CPU kunokulindelekile, kodwa lokhu akuzange kunikezwe ukubaluleka okukhulu, njengoba amaseva ethu anezinsiza eziningi ze-CPU, ngakho-ke inkinga ayizange ifundwe ngokucophelela.

Inkinga ukuthi amaqoqo acabangela ukusetshenziswa kwememori ngaphakathi kwendawo yamagama (isitsha). Lapho zonke izinqubo kuleli qembu ziphuma, i-Docker ikhulula iqoqo lememori. Nokho, "inkumbulo" akuyona nje ukucubungula inkumbulo. Yize inkumbulo yenqubo ngokwayo ingasasetshenziswa, kubonakala sengathi i-kernel isanikeza okuqukethwe okugcinwe kunqolobane, okufana namazinyo nama-inodes (uhlu lwemibhalo kanye nemethadatha yefayela), agcinwe kunqolobane yememori. Kusukela encazelweni yenkinga:

ama-zombie cgroups: amaqoqo angenazo izinqubo futhi asusiwe, kodwa asenayo inkumbulo eyabiwe (endabeni yami, kusukela kunqolobane yamazinyo, kodwa futhi inganikezwa kusuka kunqolobane yekhasi noma ama-tmpfs).

Ukuhlolwa kwe-kernel kwawo wonke amakhasi kunqolobane lapho kukhulula iqoqo kungase kuhambe kancane, ngakho inqubo yobuvila iyakhethwa: linda kuze kube yilapho la makhasi ecelwa futhi, bese ekugcineni usula iqoqo lapho inkumbulo idingeka ngempela. Kuze kube yileli phuzu, iqoqo lisacatshangelwa lapho kuqoqwa izibalo.

Ngokombono wokusebenza, badela inkumbulo yokusebenza: ukusheshisa ukuhlanzwa kokuqala ngokushiya inkumbulo ethile egciniwe ngemuva. Lokhu kuhle. Lapho i-kernel isebenzisa inkumbulo yokugcina egcinwe kunqolobane, iqoqo ligcina selisuliwe, ngakho alikwazi ukubizwa ngokuthi "ukuvuza". Ngeshwa, ukuqaliswa okuqondile kwendlela yosesho memory.stat kule nguqulo ye-kernel (4.9), kuhlanganiswe nenani elikhulu lenkumbulo kumaseva ethu, kusho ukuthi kuthatha isikhathi eside ukubuyisela idatha egcinwe kunqolobane yakamuva nokusula ama-zombies eqembu.

Kuvele ukuthi amanye ama-node ethu abe nama-Zombies eqembu amaningi kangangokuthi ukufundwa nokubambezeleka kudlule umzuzwana.

Indlela yokusebenza yenkinga ye-cadvisor ukukhulula ngokushesha izinqolobane zamazinyo/ama-inode ohlelweni lonke, okuqeda ngokushesha ukubambezeleka kokufunda kanye nokubambezeleka kwenethiwekhi kumsingathi, njengoba ukusula inqolobane kuvula amakhasi eqembu le-zombie afakwe kunqolobane futhi kuyawakhulula. Lesi akusona isixazululo, kodwa siqinisekisa imbangela yenkinga.

Kuvele ukuthi ezinguqulweni ezintsha ze-kernel (4.19+) ukusebenza kwekholi kuye kwathuthukiswa memory.stat, ngakho ukushintshela kule kernel kulungise inkinga. Ngesikhathi esifanayo, sasinamathuluzi okuthola ama-node ayinkinga kumaqoqo e-Kubernetes, ukuwakhipha ngomusa bese siwaqalisa kabusha. Sihlanganise wonke amaqoqo, sathola ama-node ane-latency ephakeme ngokwanele futhi sawaqalisa kabusha. Lokhu kusinike isikhathi sokubuyekeza i-OS kumaseva asele.

Ukufingqa

Ngenxa yokuthi lesi siphazamisi simise ukucutshungulwa komugqa we-RX NIC kumakhulu ama-millisecond, ngesikhathi esifanayo kubangele ukubambezeleka okuphezulu ekuxhumekeni okufushane kanye nokubambezeleka okuphakathi koxhumano, njengaphakathi kwezicelo ze-MySQL namaphakethe okuphendula.

Ukuqonda nokugcina ukusebenza kwezinhlelo ezibaluleke kakhulu, ezifana ne-Kubernetes, kubalulekile ekuthembekeni nasekushesheni kwazo zonke izinsizakalo ezisekelwe kuzo. Lonke uhlelo olusebenzisayo luyazuza kusukela ekuthuthukisweni kokusebenza kwe-Kubernetes.

Source: www.habr.com

Engeza amazwana