Eminyakeni embalwa edlule uKubernetes
Empeleni, izinhlelo zokusebenza zithola ukubambezeleka kwenethiwekhi okungahleliwe okungafika ku-100ms noma ngaphezulu, okuholela ekuphelelweni kwesikhathi noma ukuzama futhi. Izinsizakalo bekulindeleke ukuthi zikwazi ukuphendula izicelo ngokushesha kakhulu kuno-100ms. Kodwa lokhu akunakwenzeka uma uxhumano ngokwalo luthatha isikhathi esiningi. Ngokwehlukana, siqaphele imibuzo eshesha kakhulu ye-MySQL okufanele ithathe ama-millisecond, futhi i-MySQL iqede ngama-millisecond, kodwa ngokombono wohlelo olucelayo, impendulo ithathe 100 ms noma ngaphezulu.
Ngokushesha kwacaca ukuthi inkinga yenzeka kuphela lapho uxhuma endaweni ye-Kubernetes, noma ngabe ucingo luvela ngaphandle kwe-Kubernetes. Indlela elula yokukhiqiza kabusha inkinga isesivivinyweni
Ukuqeda ubunkimbinkimbi obungadingekile kuketango okuholela ekuhlulekeni
Ngokukhiqiza kabusha isibonelo esifanayo, besifuna ukunciphisa ukugxila kwenkinga futhi sisuse izendlalelo ezingadingekile zokuyinkimbinkimbi. Ekuqaleni, bekunezakhi eziningi kakhulu ekugelezeni phakathi kwe-Vegeta nama-Kubernetes pods. Ukuhlonza inkinga yenethiwekhi ejulile, udinga ukukhipha ezinye zazo.
Iklayenti (i-Vegeta) idala uxhumano lwe-TCP nanoma iyiphi i-node kuqoqo. I-Kubernetes isebenza njengenethiwekhi eyimbondela (phezu kwenethiwekhi yesikhungo sedatha ekhona) esebenzisa
Okusetshenziswayo tcpdump
ekuhlolweni kwe-Vegeta kukhona ukubambezeleka ngesikhathi sokuxhawula kwe-TCP (phakathi kwe-SYN ne-SYN-ACK). Ukuze ususe lokhu kuyinkimbinkimbi okungadingekile, ungasebenzisa hping3
"ama-pings" alula anamaphakethe e-SYN. Sihlola ukuthi ingabe kukhona ukulibaziseka ephaketheni lokuphendula, bese sisetha kabusha uxhumano. Singahlunga idatha ukuze sifake amaphakethe angaphezu kuka-100ms kuphela futhi sithole indlela elula yokukhiqiza kabusha inkinga kunokuhlola kwenethiwekhi ye-Vegeta kwesendlalelo sesi-7 esigcwele. Nawa ama-"pings" we-Kubernetes node esebenzisa i-TCP SYN/SYN-ACK kusevisi ye-"node port" (30927) ngezikhathi ezingu-10ms, ehlungwa ngezimpendulo ezihamba kancane:
theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms
Ungakwazi ngokushesha ukwenza observation lokuqala. Uma sibheka izinombolo ezilandelanayo kanye nezikhathi, kuyacaca ukuthi lokhu akukona ukuminyana okwenzeka kanye. Ukubambezeleka kuvame ukunqwabelana futhi ekugcineni kucutshungulwe.
Okulandelayo, sifuna ukuthola ukuthi yiziphi izingxenye ezingase zibandakanyeke ekuveleni kokuminyana. Mhlawumbe lena eminye yemithetho engamakhulu e-iptables ku-NAT? Noma ingabe zikhona izinkinga ngokushuna kwe-IPIP kunethiwekhi? Enye indlela yokuhlola lokhu ukuhlola isinyathelo ngasinye sesistimu ngokusisusa. Kwenzekani uma ususa i-NAT kanye ne-firewall logic, ushiya kuphela ingxenye ye-IPIP:
Ngenhlanhla, i-Linux yenza kube lula ukufinyelela isendlalelo sokumbondelana se-IP ngokuqondile uma umshini ukunethiwekhi efanayo:
theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms
Uma sibheka imiphumela, inkinga isekhona! Lokhu akufaki ama-iptables kanye ne-NAT. Ngakho-ke inkinga i-TCP? Ake sibone ukuthi i-ping ye-ICMP evamile ihamba kanjani:
theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms
len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms
len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms
len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms
len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms
len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms
len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms
len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms
len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms
len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms
Imiphumela ikhombisa ukuthi inkinga ayikasuki. Mhlawumbe lona umhubhe we-IPIP? Asenze lula ukuhlola ngokuqhubekayo:
Ingabe wonke amaphakethe athunyelwa phakathi kwalaba basingathi ababili?
theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms
len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms
len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms
len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms
len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms
len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms
Senze isimo saba lula kumanodi amabili e-Kubernetes athumelana noma yiliphi iphakethe, ngisho ne-ICMP ping. Basabona ukubambezeleka uma usokhaya okuqondiwe "emubi" (ezinye zimbi kakhulu kunabanye).
Manje umbuzo wokugcina: kungani ukubambezeleka kwenzeka kuphela kumaseva e-kube-node? Futhi ingabe kwenzeka uma i-kube-node ingumthumeli noma umamukeli? Ngenhlanhla, lokhu futhi kulula kakhulu ukukuthola ngokuthumela iphakethe elivela kumsingathi ongaphandle kwe-Kubernetes, kodwa nomamukeli ofanayo “omubi” ofanayo. Njengoba ubona, inkinga ayizange inyamalale:
theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms
Sizobe sesisebenzisa izicelo ezifanayo kusukela kumthombo wangaphambilini we-kube-node ukuya kumsingathi wangaphandle (ongabandakanyi umsingathi womthombo njengoba i-ping ihlanganisa kokubili ingxenye ye-RX ne-TX):
theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms
Ngokuhlola ukuthwebula kwephakethe le-latency, sithole ulwazi olwengeziwe. Ngokucacile, ukuthi umthumeli (phansi) ubona lesi sikhathi sokuvala, kodwa umamukeli (phezulu) akakuboni - bona ikholomu ye-Delta (ngemizuzwana):
Ngaphezu kwalokho, uma ubheka umehluko ekuhleleni kwamaphakethe e-TCP ne-ICMP (ngezinombolo zokulandelana) ohlangothini lomamukeli, amaphakethe e-ICMP ahlala efika ngokulandelana okufanayo lapho athunyelwe khona, kodwa ngesikhathi esihlukile. Ngesikhathi esifanayo, amaphakethe e-TCP ngezinye izikhathi ayahlukana, futhi amanye awo abambeke. Ikakhulukazi, uma uhlola izimbobo zamaphakethe e-SYN, ahlelekile ngasohlangothini lomthumeli, kodwa hhayi ngasohlangothini lomamukeli.
Kunomehluko ocashile endleleni
Okunye ukuqaphela okusha: ngalesi sikhathi sibona ukubambezeleka kwe-ICMP kukho konke ukuxhumana phakathi kwabasingathi ababili, kodwa i-TCP ayikwenzi lokho. Lokhu kusitshela ukuthi imbangela kungenzeka ihlobene ne-RX hashing yomugqa: ukuminyana cishe impela kusekucutshungulweni kwamaphakethe e-RX, hhayi ekuthumeleni izimpendulo.
Lokhu kuqeda ukuthumela amaphakethe ohlwini lwezimbangela ezingase zibe khona. Manje sesiyazi ukuthi inkinga yokucubungula iphakethe isohlangothini lokwamukela kwamanye amaseva e-kube-node.
Ukuqonda ukucutshungulwa kwephakethe ku-Linux kernel
Ukuze uqonde ukuthi kungani inkinga yenzeka kumamukeli kwamanye amaseva e-kube-node, ake sibheke ukuthi i-Linux kernel iwaqhuba kanjani amaphakethe.
Ibuyela ekusetshenzisweni kwendabuko okulula, ikhadi lenethiwekhi lithola iphakethe bese liyathumela
Lokhu kushintsha kokuqukethwe kuhamba kancane: ukubambezeleka kungenzeka kwakungabonakali kumakhadi enethiwekhi angu-10Mbps ngawo-'90s, kodwa emakhadini esimanje e-10G anomthamo omkhulu wamaphakethe ayizigidi ezingu-15 ngomzuzwana, umongo ngamunye weseva encane eyisishiyagalombili ingaphazanyiswa izigidi. izikhathi ngomzuzwana.
Ukuze ungahlali ubamba iziphazamiso, eminyakeni eminingi edlule i-Linux yengeze
Lokhu kuyashesha kakhulu, kodwa kubangela inkinga ehlukile. Uma kunamaphakethe amaningi, khona-ke sonke isikhathi sichithwa ukucubungula amaphakethe asuka ekhadini lenethiwekhi, futhi izinqubo zesikhala somsebenzisi azinaso isikhathi sokuthulula ngempela le migqa (ukufunda kusuka ekuxhumekeni kwe-TCP, njll.). Ekugcineni imigqa iyagcwala futhi siqala ukulahla amaphakethe. Emzamweni wokuthola ibhalansi, i-kernel ibeka isabelomali senombolo enkulu yamaphakethe acutshungulwe kumongo we-softirq. Uma lesi sabelomali seqiwe, kuvuka uchungechunge oluhlukile ksoftirqd
(uzobona omunye wabo phakathi ps
per core) ephatha lawa ma-softirqs ngaphandle kwendlela evamile ye-syscall/yokuphazamisa. Lolu chungechunge luhlelwe kusetshenziswa isihleli senqubo esijwayelekile, esizama ukwaba izinsiza ngendlela efanele.
Ngemva kokufunda ukuthi i-kernel iwacubungula kanjani amaphakethe, ungabona ukuthi kunethuba elithile lokuminyana. Uma izingcingo ze-softirq zitholwa kancane njalo, amaphakethe kuzodingeka alinde isikhathi esithile ukuze acutshungulwe kulayini we-RX ekhadini lenethiwekhi. Lokhu kungase kube ngenxa yomsebenzi othile ovimbela umgogodla wokucubungula, noma okunye okuvimbela umgogodla ekusebenziseni i-softirq.
Ukunciphisa ukucutshungulwa kuze kufike kumongo noma indlela
Ukubambezeleka kweSoftirq kuwukuqagela nje okwamanje. Kodwa kunengqondo, futhi siyazi ukuthi sibona into efanayo kakhulu. Ngakho isinyathelo esilandelayo siwukuqinisekisa lo mbono. Futhi uma kuqinisekiswa, khona-ke thola isizathu sokubambezeleka.
Masibuyele emaphaketheni ethu anensa:
len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms
len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms
len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms
len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms
len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms
len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms
len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms
len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms
Njengoba kuxoxiwe ngaphambili, lawa maphakethe e-ICMP asheshiselwa kumugqa owodwa we-RX NIC futhi acutshungulwe ngomongo owodwa we-CPU. Uma sifuna ukuqonda ukuthi i-Linux isebenza kanjani, kuyasiza ukwazi ukuthi (kuwuphi umnyombo we-CPU) nokuthi (softirq, ksoftirqd) la maphakheji acutshungulwa kanjani ukuze kulandelelwe inqubo.
Manje sekuyisikhathi sokusebenzisa amathuluzi akuvumela ukuthi ugade i-Linux kernel ngesikhathi sangempela. Lapha sasebenzisa
Uhlelo lapha lulula: siyazi ukuthi i-kernel icubungula lawa ma-pings e-ICMP, ngakho-ke sizofaka ihhuku emsebenzini we-kernel. hping3
ngaphezulu.
Ikhodi icmp_echo
idlulisa struct sk_buff *skb
: Leli iphakethe eline "isicelo se-echo". Singakwazi ukuyilandela, sikhiphe ukulandelana echo.sequence
(okuqhathanisa ne icmp_seq
ngo hping3 выше
), bese uyithumela esikhaleni somsebenzisi. Kuyafaneleka futhi ukuthwebula igama/i-id yenqubo yamanje. Ngezansi imiphumela esiyibona ngqo ngenkathi i-kernel icubungula amaphakethe:
I-TGID PID inqubo yenqubo Igama le-ICMP_SEQ 0 0 swapper / 11 770 0 0 swapper / 11 771 0 0 SwapPer / 11 772 0 0 SwapPurce / 11 773 0 0 SwapPer / 11 774 20041 20086 SwapPer / 775 0 0 11 swapper / 776 0 0 11 777 0 0 11 778 swapper/4512 4542 779 XNUMX spokes-report-s XNUMX
Kufanele kuqashelwe lapha ukuthi umongo softirq
izinqubo ezenze amakholi esistimu zizovela "njengezinqubo" kuyilapho empeleni kuyi-kernel ecubungula amaphakethe ngokuphephile kumongo we-kernel.
Ngaleli thuluzi singahlobanisa izinqubo ezithile namaphakheji athile abonisa ukubambezeleka kwe hping3
. Ake sikwenze kube lula grep
kulokhu kuthwebula amanani athile icmp_seq
. Amaphakethe afana namanani angenhla e-icmp_seq aye aphawulwa kanye ne-RTT yawo esiyibone ngenhla (kubakaki amanani e-RTT alindelwe amaphakethe esiwahlunge ngenxa yamanani e-RTT angaphansi kuka-50 ms):
TGID PID INQUBO IGAMA ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 11 ksoft 1954 ksoft 89 ir qd/76 76 ** 11ms 1955 79 ksoftirqd/ 76 76 ** 11ms 1956 69 ksoftirqd/76 76 ** 11ms 1957 59 ksoftirqd/76 76 ** (11ms) 1958 49 ksoftirqd/76 76 ** (11ms) 1959 39irqd 76 k ksoft irqd/ 76 11 ** (1960ms) 29 76 ksoftirqd/76 11 ** (1961ms) -- 19 76 cadvisor 76 11 1962 cadvisor 9 10137 10436 2068 10137 ksoft 10436 ksoft 2069 irqd/76 76 ** 11ms 2070 75 ksoftirqd/ 76 76 ** 11ms 2071 65 ksoftirqd/76 76 ** (11ms) 2072 55 ksoftirqd/76 76 ** (11ms) 2073 45 ksoftirqd/76 76 11 ** 2074 35 ksoftirqd/ 76 76 ksoftirqd 11 ** 2075 25 (76ms) ms ) 76 11 ksoftirqd/2076 15 ** (76ms)
Imiphumela isitshela izinto ezimbalwa. Okokuqala, wonke lawa maphakheji acutshungulwa yingqikithi ksoftirqd/11
. Lokhu kusho ukuthi kulo mbhangqwana wemishini, amaphakethe e-ICMP aphuthunyiswe ku-core 11 ekugcineni kokwamukela. Siphinde sibone ukuthi noma nini lapho kuba ne-jam, kukhona amaphakethe acutshungulwa kumongo wocingo lwesistimu. cadvisor
. Ngemuva kwalokho ksoftirqd
ithatha umsebenzi futhi icubungule ulayini onqwabelene: inani kanye nenani lamaphakethe anqwabelene ngemva kwalokho. cadvisor
.
Iqiniso lokuthi ngokushesha ngaphambi lisebenza njalo cadvisor
, kusho ukuhileleka kwakhe enkingeni. Okuxakayo ukuthi inhloso
Njengezinye izici zeziqukathi, lawa wonke angamathuluzi athuthuke kakhulu futhi kungalindeleka ukuthi ahlangabezane nezinkinga zokusebenza ngaphansi kwezimo ezithile ezingalindelekile.
I-cadvisor yenzani ebambezela ulayini wephakethe?
Manje sesinokuqonda okuhle kakhulu kokuthi kwenzeka kanjani ukuphahlazeka, ukuthi iyiphi inqubo ebangela lokho, nokuthi iyiphi i-CPU. Siyabona ukuthi ngenxa yokuvinjwa kanzima, i-Linux kernel ayinaso isikhathi sokuhlela ksoftirqd
. Futhi siyabona ukuthi amaphakethe acutshungulwa ngokomongo cadvisor
. Kunengqondo ukucabanga ukuthi cadvisor
yethula i-syscall ehamba kancane, ngemuva kwalokho wonke amaphakethe aqoqwe ngaleso sikhathi ayacutshungulwa:
Lena ithiyori, kodwa ungayihlola kanjani? Esingakwenza ukulandelela umnyombo we-CPU kuyo yonke le nqubo, sithole iphuzu lapho inani lamaphakethe lidlula isabelomali bese kuthiwa i-ksoftirqd, bese sibheka emuva kancane ukuze sibone ukuthi yini ngempela ebisebenza kumongo we-CPU ngaphambi kwalelo phuzu. . Kufana ne-x-raying i-CPU njalo ngama-millisecond ambalwa. Kuzobukeka kanjena:
Kalula, konke lokhu kungenziwa ngamathuluzi akhona. Ngokwesibonelo, ksoftirqd
:
# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100
Nansi imiphumela:
(сотни следов, которые выглядят похожими)
cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run
Kunezinto eziningi lapha, kodwa okubalulekile ukuthi sithola iphethini "ye-cadvisor ngaphambi kwe-ksoftirqd" esiyibone ngaphambili kumkhondo we-ICMP. Kusho ukuthini?
Umugqa ngamunye uwumkhondo we-CPU ngesikhathi esithile. Ukushayela ngakunye phansi isitaki kulayini kuhlukaniswa isemikholoni. Maphakathi nemigqa sibona i-syscall ibizwa ngokuthi: read(): .... ;do_syscall_64;sys_read; ...
. Ngakho-ke i-cadvisor ichitha isikhathi esiningi ocingweni lwesistimu read()
ezihlobene nemisebenzi mem_cgroup_*
(phezulu kwesitaki sekholi/ukuphela komugqa).
Kuyaphazamisa ukubona kumkhondo wekholi ukuthi yini ngempela efundwayo, ngakho-ke asigijime strace
futhi ake sibone ukuthi i-cadvisor yenzani futhi sithole izingcingo zesistimu ezinde kuno-100 ms:
theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>
Njengoba ungalindela, sibona amakholi ahamba kancane lapha read()
. Kusukela kokuqukethwe kokufunda ukusebenza nomongo mem_cgroup
kuyacaca ukuthi lezi zinselelo read()
bhekisa kufayela memory.stat
, okubonisa ukusetshenziswa kwenkumbulo kanye nemikhawulo yeqembu (ubuchwepheshe bokuhlukanisa insiza ye-Docker). Ithuluzi le-cadvisor libuza leli fayela ukuze lithole ulwazi lokusetshenziswa kwensiza yeziqukathi. Ake sihlole ukuthi i-kernel noma i-cadvisor yenza okuthile okungalindelekile:
theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null
real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $
Manje sesingakwazi ukukhiqiza kabusha iphutha futhi siqonde ukuthi i-Linux kernel ibhekene ne-pathology.
Kungani umsebenzi wokufunda uhamba kancane?
Kulesi sigaba, kulula kakhulu ukuthola imilayezo evela kwabanye abasebenzisi mayelana nezinkinga ezifanayo. Njengoba kwenzeka, ku-tracker ye-cadvisor lesi siphazamisi sibikwe njenge
Inkinga ukuthi amaqoqo acabangela ukusetshenziswa kwememori ngaphakathi kwendawo yamagama (isitsha). Lapho zonke izinqubo kuleli qembu ziphuma, i-Docker ikhulula iqoqo lememori. Nokho, "inkumbulo" akuyona nje ukucubungula inkumbulo. Yize inkumbulo yenqubo ngokwayo ingasasetshenziswa, kubonakala sengathi i-kernel isanikeza okuqukethwe okugcinwe kunqolobane, okufana namazinyo nama-inodes (uhlu lwemibhalo kanye nemethadatha yefayela), agcinwe kunqolobane yememori. Kusukela encazelweni yenkinga:
ama-zombie cgroups: amaqoqo angenazo izinqubo futhi asusiwe, kodwa asenayo inkumbulo eyabiwe (endabeni yami, kusukela kunqolobane yamazinyo, kodwa futhi inganikezwa kusuka kunqolobane yekhasi noma ama-tmpfs).
Ukuhlolwa kwe-kernel kwawo wonke amakhasi kunqolobane lapho kukhulula iqoqo kungase kuhambe kancane, ngakho inqubo yobuvila iyakhethwa: linda kuze kube yilapho la makhasi ecelwa futhi, bese ekugcineni usula iqoqo lapho inkumbulo idingeka ngempela. Kuze kube yileli phuzu, iqoqo lisacatshangelwa lapho kuqoqwa izibalo.
Ngokombono wokusebenza, badela inkumbulo yokusebenza: ukusheshisa ukuhlanzwa kokuqala ngokushiya inkumbulo ethile egciniwe ngemuva. Lokhu kuhle. Lapho i-kernel isebenzisa inkumbulo yokugcina egcinwe kunqolobane, iqoqo ligcina selisuliwe, ngakho alikwazi ukubizwa ngokuthi "ukuvuza". Ngeshwa, ukuqaliswa okuqondile kwendlela yosesho memory.stat
kule nguqulo ye-kernel (4.9), kuhlanganiswe nenani elikhulu lenkumbulo kumaseva ethu, kusho ukuthi kuthatha isikhathi eside ukubuyisela idatha egcinwe kunqolobane yakamuva nokusula ama-zombies eqembu.
Kuvele ukuthi amanye ama-node ethu abe nama-Zombies eqembu amaningi kangangokuthi ukufundwa nokubambezeleka kudlule umzuzwana.
Indlela yokusebenza yenkinga ye-cadvisor ukukhulula ngokushesha izinqolobane zamazinyo/ama-inode ohlelweni lonke, okuqeda ngokushesha ukubambezeleka kokufunda kanye nokubambezeleka kwenethiwekhi kumsingathi, njengoba ukusula inqolobane kuvula amakhasi eqembu le-zombie afakwe kunqolobane futhi kuyawakhulula. Lesi akusona isixazululo, kodwa siqinisekisa imbangela yenkinga.
Kuvele ukuthi ezinguqulweni ezintsha ze-kernel (4.19+) ukusebenza kwekholi kuye kwathuthukiswa memory.stat
, ngakho ukushintshela kule kernel kulungise inkinga. Ngesikhathi esifanayo, sasinamathuluzi okuthola ama-node ayinkinga kumaqoqo e-Kubernetes, ukuwakhipha ngomusa bese siwaqalisa kabusha. Sihlanganise wonke amaqoqo, sathola ama-node ane-latency ephakeme ngokwanele futhi sawaqalisa kabusha. Lokhu kusinike isikhathi sokubuyekeza i-OS kumaseva asele.
Ukufingqa
Ngenxa yokuthi lesi siphazamisi simise ukucutshungulwa komugqa we-RX NIC kumakhulu ama-millisecond, ngesikhathi esifanayo kubangele ukubambezeleka okuphezulu ekuxhumekeni okufushane kanye nokubambezeleka okuphakathi koxhumano, njengaphakathi kwezicelo ze-MySQL namaphakethe okuphendula.
Ukuqonda nokugcina ukusebenza kwezinhlelo ezibaluleke kakhulu, ezifana ne-Kubernetes, kubalulekile ekuthembekeni nasekushesheni kwazo zonke izinsizakalo ezisekelwe kuzo. Lonke uhlelo olusebenzisayo luyazuza kusukela ekuthuthukisweni kokusebenza kwe-Kubernetes.
Source: www.habr.com