Zaka zingapo zapitazo Kubernetes
Kwenikweni, mapulogalamu amakumana ndi kuchedwa kwapaintaneti kwachisawawa mpaka 100ms kapena kupitilira apo, zomwe zimapangitsa kutha kwanthawi kapena kuyesanso. Ntchito zikuyembekezeka kuyankha zopempha mwachangu kwambiri kuposa 100ms. Koma izi sizingatheke ngati kulumikizana komweko kumatenga nthawi yayitali. Payokha, tidawona mafunso othamanga kwambiri a MySQL omwe amayenera kutenga ma milliseconds, ndipo MySQL idamaliza mu milliseconds, koma malinga ndi momwe amafunsira, kuyankha kudatenga 100ms kapena kupitilira apo.
Nthawi yomweyo zidadziwika kuti vutoli lidangochitika polumikizana ndi Kubernetes node, ngakhale kuyimbako kudachokera kunja kwa Kubernetes. Njira yosavuta yobweretsera vutoli ndi kuyesa
Kuchotsa zovuta zosafunikira mu unyolo zomwe zimabweretsa kulephera
Popanganso chitsanzo chomwechi, tinkafuna kuchepetsa vutolo ndikuchotsa zovuta zosafunikira. Poyamba, panali zinthu zambiri pakuyenda pakati pa Vegeta ndi Kubernetes pods. Kuti mudziwe vuto lakuya la intaneti, muyenera kuchotsa ena mwa iwo.
Makasitomala (Vegeta) amapanga kulumikizana kwa TCP ndi node iliyonse pagulu. Kubernetes imagwira ntchito ngati netiweki yowonjezera (pamwamba pa intaneti yomwe ilipo) yomwe imagwiritsa ntchito
Zothandiza tcpdump
mu mayeso a Vegeta pali kuchedwa panthawi ya TCP kugwirana chanza (pakati pa SYN ndi SYN-ACK). Kuti muchotse zovuta zosafunikira izi, mutha kugwiritsa ntchito hping3
pa "pings" zosavuta ndi mapaketi a SYN. Timayang'ana ngati pali kuchedwa mu paketi yoyankhira, ndikukhazikitsanso kulumikizana. Titha kusefa zomwe zikufunika kuti ziphatikize mapaketi akulu kuposa 100ms ndikupeza njira yosavuta yobweretsera vutoli kuposa kuyesa kwa netiweki 7 ku Vegeta. Nawa ma "pings" a Kubernetes pogwiritsa ntchito TCP SYN/SYN-ACK pa "node port" ya "node" (30927) pakadutsa 10ms, osefedwa ndi mayankho otsika kwambiri:
theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms
Nthawi yomweyo kupanga kuwonetsetsa koyamba. Tikayang'ana mawerengero otsatizana ndi nthawi, n'zoonekeratu kuti izi siziri za nthawi imodzi. Kuchedwako nthawi zambiri kumachulukana ndipo pamapeto pake kumakonzedwa.
Kenako, tikufuna kudziwa kuti ndi zigawo ziti zomwe zingakhudzidwe ndi vuto la kusokonekera. Mwina awa ndi ena mwa mazana a malamulo a iptables mu NAT? Kapena pali vuto lililonse ndi IPIP tunneling pamaneti? Njira imodzi yoyesera izi ndikuyesa gawo lililonse la dongosololi pochotsa. Chimachitika ndi chiyani ngati mutachotsa NAT ndi logic ya firewall, ndikusiya gawo la IPIP lokha:
Mwamwayi, Linux imapangitsa kuti zikhale zosavuta kuti mufikire wosanjikiza wa IP mwachindunji ngati makina ali pamaneti omwewo:
theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms
Kutengera zotsatira zake, vuto likadalipo! Izi siziphatikiza ma iptables ndi NAT. Ndiye vuto ndi TCP? Tiyeni tiwone momwe ICMP ping yokhazikika imayendera:
theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms
len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms
len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms
len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms
len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms
len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms
len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms
len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms
len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms
len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms
Zotsatira zikuwonetsa kuti vutoli silinathe. Mwina iyi ndi njira ya IPIP? Tiyeni tifufuze mayeso mowonjezereka:
Kodi mapaketi onse amatumizidwa pakati pa makamu awiriwa?
theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms
len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms
len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms
len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms
len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms
len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms
Tasintha zinthu kukhala ma node awiri a Kubernetes otumizirana paketi iliyonse, ngakhale ICMP ping. Amawonabe latency ngati yemwe akumufunayo ndi "woyipa" (ena oyipa kuposa ena).
Tsopano funso lomaliza: chifukwa chiyani kuchedwa kumangochitika pa maseva a kube-node? Ndipo zimachitika pamene kube-node ndi wotumiza kapena wolandila? Mwamwayi, izi ndizosavuta kuzizindikira potumiza paketi kuchokera kwa alendo kunja kwa Kubernetes, koma ndi wolandila yemweyo "wodziwika bwino". Monga mukuonera, vutoli silinathe:
theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms
Tidzayendetsanso zopempha zomwezo kuchokera ku gwero lakale la kube-node kupita kwa wolandila wakunja (omwe samaphatikizapo gwero la gwero popeza ping imaphatikizapo zonse za RX ndi TX):
theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms
Poyang'ana zojambula za latency paketi, tapeza zina zowonjezera. Mwachindunji, kuti wotumiza (pansi) amawona nthawiyi, koma wolandira (pamwamba) samawona - onani gawo la Delta (mumasekondi):
Kuonjezera apo, ngati muyang'ana kusiyana kwa dongosolo la mapaketi a TCP ndi ICMP (mwa manambala otsatizana) kumbali yolandira, mapaketi a ICMP amafika nthawi zonse mofanana ndi momwe adatumizidwa, koma ndi nthawi yosiyana. Nthawi yomweyo, mapaketi a TCP nthawi zina amalumikizana, ndipo ena amakakamira. Makamaka, ngati muyang'ana madoko a mapaketi a SYN, ali mu dongosolo kumbali ya wotumiza, koma osati kumbali ya wolandira.
Pali kusiyana kobisika momwe
Kuwona kwina kwatsopano: panthawiyi tikuwona kuchedwa kwa ICMP pazolumikizana zonse pakati pa makamu awiri, koma TCP sichitero. Izi zikutiuza kuti choyambitsacho chikugwirizana ndi RX queue hashing: kusokonekera kumakhala pafupifupi pakukonza mapaketi a RX, osati potumiza mayankho.
Izi zimachotsa kutumiza mapaketi kuchokera pamndandanda wa zomwe zingayambitse. Tsopano tikudziwa kuti vuto lakukonza paketi lili kumbali yolandila pama seva ena a kube-node.
Kumvetsetsa kukonza paketi mu Linux kernel
Kuti timvetsetse chifukwa chomwe vutolo limachitikira pa wolandila pama seva ena a kube-node, tiyeni tiwone momwe Linux kernel imayendera mapaketi.
Kubwerera ku kukhazikitsa kosavuta kwachikhalidwe, khadi ya netiweki imalandira paketi ndikutumiza
Kusintha kwa nkhaniyi kukuchedwa: kuchedwa sikunawonekere pa makhadi a 10Mbps mu '90s, koma pa makadi amakono a 10G okhala ndi mapaketi 15 miliyoni pa sekondi iliyonse, pakatikati pa seva yaying'ono eyiti ikhoza kusokonezedwa ndi mamiliyoni ambiri. nthawi pa sekondi iliyonse.
Kuti musakhale ndi zosokoneza nthawi zonse, zaka zambiri zapitazo Linux idawonjezera
Izi zimathamanga kwambiri, koma zimayambitsa vuto lina. Ngati mapaketi ali ochulukirapo, ndiye kuti nthawi yonseyi imathera pokonza mapaketi kuchokera pa netiweki khadi, ndipo njira za ogwiritsa ntchito sizikhala ndi nthawi yochotsa mizere iyi (kuwerenga kuchokera ku kulumikizana kwa TCP, ndi zina). Pamapeto pake mizere imadzaza ndipo timayamba kugwetsa mapaketi. Poyesera kupeza malire, kernel imayika bajeti ya chiwerengero chachikulu cha mapaketi okonzedwa mu softirq. Bajeti iyi ikadutsa, ulusi wina umadzutsidwa ksoftirqd
(mudzaona mmodzi wa iwo mu ps
per core) yomwe imagwira ma softirqs kunja kwa njira yokhazikika ya syscall/interrupt. Ulusiwu umakonzedwa pogwiritsa ntchito ndondomeko yokhazikika, yomwe imayesa kugawa chuma mwachilungamo.
Mutaphunzira momwe kernel imayendera mapaketi, mutha kuwona kuti pali mwayi wina wosokonekera. Ngati mafoni a softirq amalandiridwa pafupipafupi, mapaketi amayenera kudikirira kwakanthawi kuti akonzedwe pamzere wa RX pa intaneti. Izi zitha kukhala chifukwa cha ntchito ina yotsekereza pachimake purosesa, kapena china chake chomwe chikulepheretsa pakatikati kuyendetsa softirq.
Kuchepetsa processing mpaka pachimake kapena njira
Kuchedwa kwa Softirq ndikungoyerekeza pakadali pano. Koma ndizomveka, ndipo tikudziwa kuti tikuwona zofanana kwambiri. Kotero sitepe yotsatira ndikutsimikizira chiphunzitsochi. Ndipo ngati zatsimikizika, pezani chifukwa chakuchedwetsa.
Tiyeni tibwerere ku mapaketi athu pang'onopang'ono:
len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms
len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms
len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms
len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms
len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms
len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms
len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms
len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms
Monga tafotokozera kale, mapaketi a ICMP awa amathamangitsidwa pamzere umodzi wa RX NIC ndikukonzedwa ndi core CPU imodzi. Ngati tikufuna kumvetsetsa momwe Linux imagwirira ntchito, ndizothandiza kudziwa komwe (pamene CPU core) ndi momwe (softirq, ksoftirqd) mapaketiwa amakonzedwa kuti azitsata ndondomekoyi.
Tsopano ndi nthawi yogwiritsa ntchito zida zomwe zimakupatsani mwayi wowunika ma Linux kernel munthawi yeniyeni. Apa tidagwiritsa ntchito
Dongosolo pano ndi losavuta: tikudziwa kuti kernel imapanga ma ICMP pings, ndiye tiyika mbedza pa ntchito ya kernel. hping3
apamwamba.
kachidindo icmp_echo
zimatumiza struct sk_buff *skb
: Ichi ndi paketi yokhala ndi "pempho la echo". Tikhoza kuzilondolera, kutulutsa zotsatizana echo.sequence
(yomwe ikufanana ndi icmp_seq
pa hping3 выше
), ndikutumiza ku malo ogwiritsa ntchito. Ndiwosavuta kujambula dzina/id yomwe ilipo pano. Pansipa pali zotsatira zomwe timawona mwachindunji pomwe kernel ikupanga mapaketi:
NJIRA YA TGID PID NAME ICMP_SEQ 0 0 swapper/11 770 0 swapper/0 11 771 swapper/0 0 11 swapper/772 0 0 swapper/11 773 0 prometheus 0 11 774 swapper/20041 20086 775 swapper/0 0 11 swapper/776 0 0 spokes-lipoti-s 11
Kuyenera kudziŵika apa kuti mu nkhani softirq
njira zomwe zimayimba mafoni aziwoneka ngati "njira" pomwe kwenikweni ndi kernel yomwe imayang'anira mapaketi molingana ndi kernel.
Ndi chida ichi tikhoza kugwirizanitsa njira zenizeni ndi mapepala enieni omwe amasonyeza kuchedwa kwa hping3
. Tiyeni tipange izo mophweka grep
pa kujambula kwa zinthu zina icmp_seq
. Mapaketi ofanana ndi icmp_seq omwe ali pamwambawa adayikidwa chizindikiro pamodzi ndi RTT yawo yomwe tawona pamwambapa (m'makoloko muli ma RTT omwe amayembekezeka pamapaketi omwe tasefa chifukwa cha mitengo ya RTT yochepera 50ms):
NJIRA YA TGID PID NAME ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 ksoftirqd/11 1954 ** 89ms 76 76 ksoftirqd/11 1955 ** 79ms 76 76 ksoftirqd/11 1956 ** 69ms 76 76 ksoftirqd/11 1957 ** 59ms 76 76 ksoftirqd/11 1958 ** (49ms) 76 76 ksoftirqd/11 1959 ** (39ms) 76 76 ksoftirqd/11 1960 ** (29ms) 76 76 ksoftirqd/11 1961 ** (19ms) 76 76 ksoftirqd/11 1962 ** (9ms) -- 10137 10436 cadvisor 2068 10137 10436 cadvisor 2069 76 76 ksoftirqd/11 2070 ** 75ms 76 76 ksoftirqd/11 2071 ** 65ms 76 76 ksoftirqd/11 2072 ** 55ms 76 76 ksoftirqd/11 2073 ** (45ms) 76 76 ksoftirqd/11 2074 ** (35ms) 76 76 ksoftirqd/11 2075 ** (25ms) 76 76 ksoftirqd/11 2076 ** (15ms) 76 76 ksoftirqd/11 2077 ** (5ms)
Zotsatira zake zimatiuza zinthu zingapo. Choyamba, mapepala onsewa amakonzedwa ndi nkhaniyo ksoftirqd/11
. Izi zikutanthauza kuti pamakina awiriwa, mapaketi a ICMP adathamangitsidwa mpaka 11 pamapeto olandila. Timawonanso kuti nthawi iliyonse pakakhala kupanikizana, pali mapaketi omwe amasinthidwa malinga ndi kuyitanira kwadongosolo. cadvisor
. Kenako ksoftirqd
amatenga ntchitoyo ndikukonza mzere wosonkhanitsidwa: ndendende kuchuluka kwa mapaketi omwe adasonkhanitsidwa pambuyo pake. cadvisor
.
Mfundo yakuti nthawi yomweyo isanayambe ntchito cadvisor
, kumatanthauza kuloŵerera kwake m’vutoli. Chodabwitsa, cholinga
Monga momwe zilili ndi zotengera zina, zonsezi ndi zida zapamwamba kwambiri ndipo zitha kuyembekezera kukumana ndi zovuta zogwirira ntchito munthawi zina zosayembekezereka.
Kodi cadvisor imachita chiyani kuti ichedwetse pamzere wa paketi?
Tsopano tikumvetsetsa bwino momwe ngoziyi imachitikira, njira yomwe ikuyambitsa, ndi CPU iti. Tikuwona kuti chifukwa chotsekereza movutikira, kernel ya Linux ilibe nthawi yokonzekera ksoftirqd
. Ndipo tikuwona kuti mapaketi amakonzedwa molingana cadvisor
. Ndi zomveka kuganiza choncho cadvisor
imayambitsa syscall pang'onopang'ono, pambuyo pake mapaketi onse omwe amasonkhanitsidwa panthawiyo amakonzedwa:
Ichi ndi chiphunzitso, koma momwe mungayesere? Zomwe titha kuchita ndikutsata maziko a CPU munthawi yonseyi, pezani pomwe kuchuluka kwa mapaketi kumapitilira bajeti ndipo ksoftirqd imatchedwa, ndiyeno yang'anani mmbuyo pang'ono kuti muwone chomwe chikuyenda pachimake cha CPU isanafike nthawi imeneyo. . Zili ngati x-ray CPU ma milliseconds angapo aliwonse. Idzawoneka motere:
Mosavuta, zonsezi zitha kuchitika ndi zida zomwe zilipo. Mwachitsanzo, ksoftirqd
:
# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100
Nazi zotsatira:
(сотни следов, которые выглядят похожими)
cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run
Pali zinthu zambiri pano, koma chachikulu ndikuti timapeza "cadvisor pamaso pa ksoftirqd" yomwe tidawona kale mu ICMP tracer. Zikutanthauza chiyani?
Mzere uliwonse ndi CPU trace pa nthawi inayake. Kuyitanira kulikonse pamzere kumasiyanitsidwa ndi semicolon. Pakati pa mizere tikuwona syscall ikutchedwa: read(): .... ;do_syscall_64;sys_read; ...
. Chifukwa chake cadvisor amathera nthawi yochuluka pakuyimba foni read()
zokhudzana ndi ntchito mem_cgroup_*
(pamwamba pazitali zoyimba / kumapeto kwa mzere).
Ndizosasangalatsa kuwona pakuyimba foni zomwe zikuwerengedwa, ndiye tiyeni tithawe strace
ndipo tiwone zomwe cadvisor amachita ndikupeza makina amayitanitsa nthawi yayitali kuposa 100ms:
theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>
Monga momwe mungayembekezere, tikuwona mafoni apang'onopang'ono pano read()
. Kuchokera ku zomwe zili mu ntchito yowerenga ndi nkhani mem_cgroup
zikuwonekeratu kuti zovuta izi read()
onetsani ku fayilo memory.stat
, yomwe ikuwonetsa kugwiritsa ntchito kukumbukira ndi malire amagulu (ukadaulo wa Docker's resource isolation). Chida cha cadvisor chimafunsa fayiloyi kuti ipeze zambiri zogwiritsira ntchito zotengera. Tiyeni tiwone ngati ndi kernel kapena cadvisor akuchita zosayembekezereka:
theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null
real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $
Tsopano titha kupanganso cholakwikacho ndikumvetsetsa kuti kernel ya Linux ikukumana ndi matenda.
Chifukwa chiyani ntchito yowerengera ikuchedwa?
Panthawi imeneyi, zimakhala zosavuta kupeza mauthenga ochokera kwa ena okhudzana ndi mavuto ofanana. Monga momwe zinakhalira, mu cadvisor tracker cholakwika ichi chinanenedwa ngati
Vuto ndiloti magulu amaganizira kugwiritsa ntchito kukumbukira mkati mwa namespace (chotengera). Zonse zikatuluka mgululi, Docker amamasula gulu lokumbukira. Komabe, "memory" sikuti imangokhala kukumbukira. Ngakhale kukumbukira kwa ndondomekoyi sikukugwiritsidwanso ntchito, zikuwoneka kuti kernel ikuperekabe zomwe zili mkati, monga dentries ndi inodes (directory ndi file metadata), zomwe zimasungidwa mumagulu a kukumbukira. Kuchokera pamafotokozedwe avuto:
magulu a zombie: magulu omwe alibe njira ndipo achotsedwa, koma akadali ndi kukumbukira (kwa ine, kuchokera ku dentry cache, komanso akhoza kugawidwa kuchokera pa cache kapena tmpfs).
Kufufuza kwa kernel kwa masamba onse omwe ali mu cache pamene kumasula gulu kungakhale kochedwa kwambiri, kotero njira yaulesi imasankhidwa: dikirani mpaka masambawa apemphedwe kachiwiri, ndiyeno potsiriza yeretsani gululo pamene kukumbukira kuli kofunikira. Mpaka pano, cgroup imaganiziridwabe posonkhanitsa ziwerengero.
Kuchokera pamawonekedwe a magwiridwe antchito, adasiya kukumbukira kuti agwire ntchito: kufulumizitsa kuyeretsa koyambirira posiya kukumbukira kosungidwa. Izi nzabwino. Pamene kernel imagwiritsa ntchito kukumbukira komaliza, gululo limachotsedwa, kotero silingatchulidwe kuti "kutulutsa". Tsoka ilo, kukhazikitsidwa kwachindunji kwa njira yofufuzira memory.stat
mu mtundu wa kernel (4.9), wophatikizidwa ndi kuchuluka kwa kukumbukira pamaseva athu, kumatanthauza kuti zimatenga nthawi yayitali kubwezeretsa zosungidwa zaposachedwa ndikuchotsa Zombies zamagulu.
Zinapezeka kuti ma node athu ena anali ndi Zombies zambiri zamagulu kotero kuti kuwerenga ndi latency kudaposa sekondi imodzi.
The workaround for the cadvisor issue is to free dentries/inodes caches in the system, that soon amachotsa kuwerenga latency komanso network latency pa host host, popeza kuchotsa cache kumatembenukira pa cached cgroup zombie masamba nawonso amamasulidwa. Iyi si njira yothetsera vutoli, koma imatsimikizira chomwe chimayambitsa vutoli.
Zinapezeka kuti m'mitundu yatsopano ya kernel (4.19+) kuyimba foni kudasinthidwa memory.stat
, kotero kusinthira ku kernel iyi kunakonza vuto. Nthawi yomweyo, tinali ndi zida zowunikira zovuta m'magulu a Kubernetes, kuwatsitsa mwaulemu ndikuyambiranso. Tidaphatikiza magulu onse, tidapeza ma node okhala ndi latency yayitali ndikuyambiranso. Izi zidatipatsa nthawi yosinthira OS pa maseva otsalawo.
Kufotokozera mwachidule
Chifukwa cholakwikachi chinayimitsa kukonza pamzere wa RX NIC kwa mazana a ma milliseconds, nthawi yomweyo zidapangitsa kuti pakhale kuchedwa kwambiri pamalumikizidwe achidule komanso kulumikizidwa kwapakatikati, monga pakati pa zopempha za MySQL ndi mapaketi oyankha.
Kumvetsetsa ndi kusunga machitidwe a machitidwe ofunikira kwambiri, monga Kubernetes, ndizofunikira kwambiri pa kudalirika ndi kuthamanga kwa mautumiki onse okhudzana ndi iwo. Dongosolo lililonse lomwe mumayendetsa limapindula ndikusintha kwa magwiridwe antchito a Kubernetes.
Source: www.habr.com