Lilemong tse 'maloa tse fetileng Kubernetes
Ha e le hantle, lits'ebetso li na le latency ea marang-rang e fihlang ho 100ms kapa ho feta, e bakang hore nako e felile kapa e leka hape. Litšebeletso li ne li lebelletsoe ho khona ho araba likopo kapele ho feta 100ms. Empa sena ha se khonehe haeba khokahano ka boeona e nka nako e ngata haholo. Ka thoko, re bone lipotso tse potlakileng tsa MySQL tse lokelang ho nka milliseconds, mme MySQL e phethile ka milliseconds, empa ho latela pono ea kopo ea kopo, karabo e nkile 100 ms kapa ho feta.
Hang-hang ho ile ha hlaka hore bothata bo etsahetse feela ha o hokela node ea Kubernetes, leha mohala o tsoa kantle ho Kubernetes. Tsela e bonolo ka ho fetisisa ea ho hlahisa bothata ke tekong
Ho felisa ho rarahana ho sa hlokahaleng ka ketane ho isang ho hloleheng
Ka ho hlahisa mohlala o tšoanang, re ne re batla ho fokotsa sepheo sa bothata le ho tlosa likarolo tse sa hlokahaleng tsa ho rarahana. Qalong, ho ne ho e-na le likarolo tse ngata haholo phallo pakeng tsa Vegeta le Kubernetes pods. Ho tseba bothata bo tebileng ba marang-rang, u lokela ho laola tse ling tsa tsona.
Moreki (Vegeta) o theha khokahano ea TCP le node efe kapa efe sehlopheng. Kubernetes e sebetsa e le marang-rang a holimo (ka holim'a marang-rang a teng a setsi sa data) a sebelisang
Tšebeliso tcpdump
tekong ea Vegeta ho na le ho lieha nakong ea ho ts'oarana ka letsoho TCP (pakeng tsa SYN le SYN-ACK). Ho tlosa bothata bona bo sa hlokahaleng, o ka sebelisa hping3
bakeng sa "pings" e bonolo ka lipakete tsa SYN. Re hlahloba hore na ho na le tieho paketeng ea karabo, ebe re tsosolosa khokahanyo. Re ka sefa datha ho kenyelletsa feela lipakete tse kholo ho feta 100ms le ho fumana tsela e bonolo ea ho hlahisa bothata ho feta tlhahlobo e felletseng ea marang-rang ea Vegeta. Mona ke "pings" ea Kubernetes e sebelisang TCP SYN/SYN-ACK ho "node port" ea ts'ebeletso (7) ka linako tse 30927ms, e tlhotliloeng ka likarabo tse liehang haholo:
theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms
len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms
Ka hang-hang ho etsa tlhokomeliso ea pele. Ho latela lipalo tsa tatellano le linako, ho hlakile hore tsena ha se tšubuhlellano ea nako e le 'ngoe. Hangata ho lieha hoa bokellana 'me qetellong ho sebetsoa.
Ka mor'a moo, re batla ho fumana hore na ke likarolo life tse ka amehang ketsahalong ea tšubuhlellano. Mohlomong tsena ke tse ling tsa makholo a melao ea iptables ho NAT? Kapa na ho na le mathata leha e le afe ka IPIP tunneling marang-rang? Tsela e 'ngoe ea ho hlahloba sena ke ho hlahloba mohato o mong le o mong oa tsamaiso ka ho e felisa. Ho etsahalang ha o tlosa NAT le logic ea firewall, o siea karolo ea IPIP feela:
Ka lehlohonolo, Linux e etsa hore ho be bonolo ho fihlella lera le koaheletsoeng la IP ka kotloloho haeba mochini o le marang-rang a tšoanang:
theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms
len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms
Ho latela liphello, bothata bo ntse bo le teng! Sena ha se kenyelle li-iptables le NAT. Joale bothata ke TCP? Ha re boneng hore na ping e tloaelehileng ea ICMP e ea joang:
theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms
len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms
len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms
len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms
len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms
len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms
len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms
len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms
len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms
len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms
Liphello li bontša hore bothata ha boa fela. Mohlomong ena ke kotopo ea IPIP? Ha re nolofatse tlhahlobo ho feta:
Na lipakete tsohle li romelloa lipakeng tsa batho baa ba babeli?
theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms
len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms
len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms
len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms
len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms
len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms
len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms
Re nolofalitse boemo ho li-node tse peli tsa Kubernetes tse romellanang pakete efe kapa efe, esita le ping ea ICMP. Ba ntse ba bona latency haeba moamoheli a le "mpe" (ba bang ba mpe ho feta ba bang).
Joale potso ea ho qetela: hobaneng tieho e etsahala feela ho li-server tsa kube-node? Hona na ho etsahala ha kube-node e le moromeli kapa moamoheli? Ka lehlohonolo, ho bonolo ho utloisisa sena ka ho romella pakete ho tsoa ho moamoheli ea kantle ho Kubernetes, empa ka moamoheli ea tšoanang "ea tsejoang hampe". Joalokaha u ka bona, bothata ha boa fela:
theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms
len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms
Joale re tla tsamaisa likopo tse tšoanang ho tsoa mohloling o fetileng oa kube-node ho moamoheli oa kantle (e sa kenyelletseng mohloli oa mohloli kaha ping e kenyelletsa karolo ea RX le TX ka bobeli):
theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms
Ka ho hlahloba lipakete tsa latency, re fumane lintlha tse ling. Ka ho khetheha, hore motho ea romelang (ka tlaase) o bona nako ena e felile, empa moamoheli (ka holimo) ha a bone - bona kholomo ea Delta (ka metsotsoana):
Ho phaella moo, haeba u sheba phapang ka tatellano ea lipakete tsa TCP le ICMP (ka linomoro tsa tatellano) ka lehlakoreng la moamoheli, lipakete tsa ICMP li lula li fihla ka tatellano e tšoanang eo li rometsoeng ka eona, empa ka nako e fapaneng. Ka nako e ts'oanang, lipakete tsa TCP ka linako tse ling lia kena-kenana, 'me tse ling tsa tsona lia khomarela. Haholo-holo, haeba u hlahloba likou tsa lipakete tsa SYN, li hlophisitsoe ka lehlakoreng la moromeli, empa eseng ka lehlakoreng la moamoheli.
Ho na le phapang e poteletseng ea hore na joang
Tlhokomeliso e 'ngoe e ncha: nakong ena re bona ho lieha ha ICMP lipuisanong tsohle pakeng tsa mabotho a mabeli, empa TCP ha e etse joalo. Sena se re bolella hore sesosa se ka 'na sa amana le RX queue hashing: tšubuhlellano e batla e le ts'ebetsong ea lipakete tsa RX, eseng ho romela likarabo.
Sena se felisa ho romela lipakete ho tsoa lethathamong la lisosa tse ka bang teng. Joale rea tseba hore bothata ba ho sebetsana le lipakete bo ka lehlakoreng la ho amohela ho li-server tse ling tsa kube-node.
Ho utloisisa ts'ebetso ea lipakete ho Linux kernel
Ho utloisisa hore na hobaneng bothata bo hlaha ho moamoheli ho li-server tse ling tsa kube-node, ha re shebeng hore na Linux kernel e sebetsa joang lipakete.
Ho khutlela ts'ebetsong e bonolo ka ho fetisisa ea setso, karete ea marang-rang e amohela pakete ebe e romela
Phetoho ea moelelo oa taba e ea liehang: ho ka etsahala hore ebe latency e ne e sa bonahale ho likarete tsa marang-rang tsa 10Mbps lilemong tsa bo-90, empa likareteng tsa sejoale-joale tsa 10G tse nang le palo e kholo ea lipakete tse limilione tse 15 motsotsoana, mokokotlo o mong le o mong oa seva e nyane ea mantlha e robeli e ka sitisoa ke limilione. ea linako ka motsotsoana.
E le hore u se ke ua lula u sebetsana le litšitiso, lilemong tse ngata tse fetileng Linux e ekelitse
Sena se potlakile haholo, empa se baka bothata bo fapaneng. Haeba ho na le lipakete tse ngata haholo, joale nako eohle e sebelisoa ho sebetsana le lipakete ho tloha kareteng ea marang-rang, 'me mekhoa ea sebaka sa mosebedisi ha e na nako ea ho tlosa li-queue tsena (ho bala ho tloha ho li-connections tsa TCP, joalo-joalo). Qetellong mela e tlala 'me re qala ho lahlela lipakete. E le ho leka ho fumana tekanyo, kernel e beha tekanyetso bakeng sa palo e kholo ea lipakete tse entsoeng ka mokhoa oa softirq. Hang ha tekanyetso ena e fetisitsoe, ho tsosoa khoele e arohaneng ksoftirqd
(o tla bona e 'ngoe ea tsona ps
per core) e sebetsanang le li-softirq tsena ka ntle ho tsela e tloaelehileng ea syscall/interrupt. Khoele ena e hlophisitsoe ho sebelisoa kemiso ea tšebetso e tloaelehileng, e lekang ho aba lisebelisoa ka toka.
Ha u se u ithutile hore na kernel e sebetsa joang lipaketeng, u ka bona hore ho na le monyetla o itseng oa tšubuhlellano. Haeba li-call tsa softirq li amoheloa khafetsa, liphutheloana li tla tlameha ho ema nako e itseng ho sebetsa moleng oa RX kareteng ea marang-rang. Sena se kanna sa bakoa ke mosebetsi o itseng o thibelang processor ea mantlha, kapa ho hong ho thibelang mantlha ho sebetsa softirq.
Ho fokotsa ts'ebetso ho fihla bohareng kapa mokgoa
Ho lieha ha Softirq ke khakanyo feela hajoale. Empa hoa utloahala, 'me rea tseba hore re bona ntho e tšoanang haholo. Kahoo mohato o latelang ke ho tiisa khopolo ena. 'Me haeba e tiisitsoe, joale fumana lebaka la tieho.
Ha re khutleleng lipaketeng tsa rona tse liehang:
len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms
len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms
len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms
len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms
len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms
len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms
len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms
len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms
Joalokaha ho boletsoe pejana, lipakete tsena tsa ICMP li potlakisetsoa moleng o le mong oa RX NIC 'me li sebetsoa ke motheo o le mong oa CPU. Haeba re batla ho utloisisa hore na Linux e sebetsa joang, ho molemo ho tseba hore na (e leng CPU core) le hore na (softirq, ksoftirqd) liphutheloana tsena li sebetsoa joang molemong oa ho latela ts'ebetso.
Joale ke nako ea ho sebelisa lisebelisoa tse u lumellang ho beha leihlo Linux kernel ka nako ea nnete. Mona re sebelitse
Morero mona o bonolo: rea tseba hore kernel e sebetsana le li-pings tsena tsa ICMP, kahoo re tla kenya hook mosebetsing oa kernel. hping3
phahameng.
khoutu icmp_echo
fetisa struct sk_buff *skb
: Ena ke pakete e nang le "kopo ea echo". Re ka e latela, ra ntša tatellano echo.sequence
(e bapisoang le icmp_seq
ka hping3 выше
), ebe o e romela sebakeng sa mosebedisi. Hape ho bonolo ho hapa lebitso/id ea ts'ebetso ea hajoale. Ka tlase ke liphetho tseo re li bonang ka kotloloho ha kernel e sebetsa lipakete:
TGID PID Movie The ImpMmp_Seq 0 0 Sekwap / 11 770 0 0 11 771 0 swapper/0 11 772 0 spokes-report-s 0
Ho ke ho hlokomeloe mona hore moelelong oa taba softirq
lits'ebetso tse entseng mehala ea sistimi li tla hlaha e le "mekhoa" ha ha e le hantle e le kernel e sebetsang ka mokhoa o sireletsehileng lipakete maemong a kernel.
Ka sesebelisoa sena re ka amahanya lits'ebetso tse ikhethileng le liphutheloana tse ikhethileng tse bonts'ang tieho ea hping3
. Ha re e nolofatseng grep
ka ho hapa sena bakeng sa litekanyetso tse itseng icmp_seq
. Lipakete tse tsamaellanang le boleng ba icmp_seq tse kaholimo li hlokometsoe hammoho le RTT ea bona eo re e boneng kaholimo (ka masakaneng ke litekanyetso tse lebelletsoeng tsa RTT bakeng sa lipakete tseo re li hloekisitseng ka lebaka la boleng ba RTT bo ka tlase ho 50 ms):
TGID PID PROCES NAME ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 11 ksoft 1954 89 ksoft 76 76 ksoft 11 ir qd/1955 79 ** 76ms 76 11 ksoftirqd/ 1956 69 ** 76ms 76 11 ksoftirqd/1957 59 ** 76ms 76 11 ksoftirqd/1958 49 ** (76ms) 76 11 ksoftirqd/1959 39 ** (76ms) 76 11 1960 ksoftirqd (29ms) 76 76 11 k ksoft irqd/ 1961 19 ** (76ms) 76 11 ksoftirqd/1962 9 ** (10137ms) -- 10436 2068 cadvisor 10137 10436 2069 cadvisor 76 76 11 2070 75 ksoft 76 ksoft 76 irqd/11 2071 ** 65ms 76 76 ksoftirqd/ 11 2072 ** 55ms 76 76 ksoftirqd/11 2073 ** (45ms) 76 76 ksoftirqd/11 2074 ** (35ms) 76 76 ksoftirqd/11 2075 25 ** 76 76 11 ** 2076 15 ksoftirqd/ 76 76 ksoftirqd ms ) 11 2077 ksoftirqd/5 XNUMX ** (XNUMXms)
Liphetho li re bolella lintho tse 'maloa. Taba ea pele, liphutheloana tsena tsohle li sebetsoa ka moelelo oa taba ksoftirqd/11
. Sena se bolela hore bakeng sa mochini ona o khethehileng, lipakete tsa ICMP li ile tsa potlakisetsoa ho 11 qetellong ea ho amohela. Hape rea bona hore neng kapa neng ha ho na le jeme, ho na le lipakete tse sebetsoang molemong oa mohala oa sistimi. cadvisor
... Joale ksoftirqd
o nka mosebetsi mme o tsamaisa letoto le bokelletsoeng: hantle palo ea lipakete tse bokelletsoeng kamora moo. cadvisor
.
Taba ea hore hang-hang pele e sebetsa kamehla cadvisor
, ho bolela ho ameha ha hae bothateng boo. Ho makatsang ke hore morero
Joalo ka likarolo tse ling tsa lijana, tsena kaofela ke lisebelisoa tse tsoetseng pele haholo 'me ho ka lebelloa ho ba le mathata a ts'ebetso tlasa maemo a sa lebelloang.
Cadvisor e etsa eng e liehisang mokoloko oa lipakete?
Hona joale re na le kutloisiso e ntle ea hore na ho oa ho etsahala joang, ke ts'ebetso efe e e bakang, le hore na ke CPU efe. Rea bona hore ka lebaka la ho thibela ka thata, kernel ea Linux ha e na nako ea ho hlophisa ksoftirqd
. 'Me rea bona hore lipakete li sebetsoa ka moelelo cadvisor
. Hoa utloahala ho nahana joalo cadvisor
e qala syscall butle, ka mor'a moo lipakete tsohle tse bokelletsoeng ka nako eo li sebetsoa:
Ena ke khopolo, empa mokhoa oa ho e leka? Seo re ka se etsang ke ho ts'oara mokokotlo oa CPU ho pholletsa le ts'ebetso ena, fumana ntlha eo palo ea lipakete e fetang tekanyetso ea lichelete 'me ksoftirqd e bitsoa, 'me u shebe morao ho feta ho bona hore na hantle-ntle ho ne ho ntse ho sebetsa joang motheong oa CPU pele ho ntlha eo. . Ho tšoana le x-raying CPU ka metsotsoana e meng le e meng e seng mekae. E tla shebahala tjena:
Ka mokhoa o bonolo, sena sohle se ka etsoa ka lisebelisoa tse teng. Ka mohlala, ksoftirqd
:
# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100
Liphetho ke tsena:
(сотни следов, которые выглядят похожими)
cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run
Ho na le lintho tse ngata mona, empa ntho e ka sehloohong ke hore re fumana mokhoa oa "cadvisor pele ho ksoftirqd" oo re o boneng pejana ho ICMP tracer. E bolelang?
Mohala o mong le o mong ke mohlala oa CPU ka nako e itseng. E 'ngoe le e' ngoe ea mohala e theolang stack moleng e arotsoe ke semicolon. Bohareng ba mela re bona syscall e bitsoa: read(): .... ;do_syscall_64;sys_read; ...
. Kahoo cadvisor e qeta nako e ngata e le mohala oa sistimi read()
tse amanang le mesebetsi mem_cgroup_*
(ka holim'a mehala ea mohala / pheletso ea mohala).
Ha ho bonolo ho bona mohala oa mohala hore na ho baloa eng, ka hona, ha re baleheng strace
'me ha re boneng hore na cadvisor e etsa eng mme re fumane mehala ea sistimi e telele ho feta 100 ms:
theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>
Joalokaha u ka lebella, re bona mehala e liehang mona read()
. Ho tsoa ho likahare tsa ts'ebetso ea ho bala le maemo mem_cgroup
ho hlakile hore liqholotso tsena read()
sheba faele memory.stat
, e bonts'ang ts'ebeliso ea memori le meeli ea lihlopha (theknoloji ea Docker's resource isolation). Sesebelisoa sa cadvisor se botsa faele ena ho fumana tlhaiso-leseling ea tšebeliso ea lisebelisoa bakeng sa lijana. Ha re hlahlobeng hore na ke kernel kapa cadvisor e etsang ntho e sa lebelloang:
theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null
real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $
Joale re ka hlahisa kokoanyana mme ra utloisisa hore kernel ea Linux e tobane le lefu la mafu.
Hobaneng ts'ebetso ea ho bala e lieha hakaale?
Nakong ena, ho bonolo haholo ho fumana melaetsa e tsoang ho basebelisi ba bang ka mathata a tšoanang. Ha e le hantle, ho tracker ea cadvisor bug ena e tlalehiloe e le
Bothata ke hore lihlopha li ela hloko tšebeliso ea memori ka har'a sebaka sa mabitso (setshelo). Ha lits'ebetso tsohle tsa sehlopha sena li tsoa, Docker e lokolla sehlopha sa memori. Leha ho le joalo, "memory" ha se feela ts'ebetso ea mohopolo. Leha memori ea ts'ebetso ka boeona e se e se e sa sebelisoe, ho bonahala eka kernel e ntse e fana ka litaba tse bolokiloeng, joalo ka meno le li-inode (metadata ea bukana ea faele), tse bolokiloeng ka har'a sehlopha sa memori. Ho tsoa ho tlhaloso ea bothata:
zombie cgroups: lihlopha tse se nang lits'ebetso mme li hlakotsoe, empa li ntse li e-na le mohopolo o fanoeng (boemong ba ka, ho tsoa ho cache ea meno, empa hape e ka abeloa ho tsoa ho cache ea leqephe kapa tmpfs).
Tlhahlobo ea kernel ea maqephe ohle a cache ha u lokolla sehlopha se ka lieha haholo, kahoo ho khethoa mokhoa oa botsoa: emela ho fihlela maqephe ana a kōptjoa hape, ebe qetellong o hlakola sehlopha ha mohopolo o hlile o hlokahala. Ho fihlela nakong ena, cgroup e ntse e nkoa ha ho bokelloa lipalo-palo.
Ho latela pono ea ts'ebetso, ba ile ba tela mohopolo bakeng sa ts'ebetso: ho potlakisa tlhoekiso ea pele ka ho siea mohopolo o bolokiloeng ka morao. Sena se lokile. Ha kernel e sebelisa mohopolo oa ho qetela oa cached, sehlopha se qetella se hlakotsoe, kahoo se ke ke sa bitsoa "leak". Ka bomalimabe, ts'ebetsong e khethehileng ea mokhoa oa ho batla memory.stat
phetolelong ena ea kernel (4.9), e kopantsoeng le palo e kholo ea memori ho li-server tsa rona, ho bolela hore ho nka nako e telele ho khutlisa data ea morao-rao e bolokiloeng le ho hlakola Zombies ea sehlopha.
Ho ile ha fumaneha hore tse ling tsa li-node tsa rona li ne li e-na le li-zombies tse ngata tsa sehlopha hoo ho balloang le latency ho ileng ha feta motsotsoana.
Tsela ea ho rarolla bothata ba cadvisor ke ho lokolla hang-hang li-cache tsa meno / li-inodes ho pholletsa le tsamaiso, e leng hang-hang e felisang latency ea ho bala hammoho le latency ea marang-rang ho moeti, kaha ho tlosa cache ho bulela maqephe a lihlopha tsa zombie le ho a lokolla. Sena ha se tharollo, empa se tiisa sesosa sa bothata.
Ho ile ha fumaneha hore liphetolelong tse ncha tsa kernel (4.19+) ts'ebetso ea mohala e ntlafalitsoe memory.stat
, kahoo ho fetohela ho kernel ena ho ile ha lokisa bothata. Ka nako e ts'oanang, re ne re e-na le lisebelisoa tsa ho bona li-node tse nang le mathata ka har'a lihlopha tsa Kubernetes, re li tšolle ka bokhabane ebe re li qala bocha. Re ile ra kopanya lihlopha tsohle, ra fumana li-node tse nang le latency e lekaneng 'me ra li qala hape. Sena se re file nako ea ho nchafatsa OS ho li-server tse setseng.
Ho akaretsa
Hobane kokoanyana ena e emisitse ts'ebetso ea queue ea RX NIC bakeng sa li-milliseconds tse makholo, ka nako e ts'oanang e bakile latency e phahameng ho likhokahanyo tse khutšoane le latency ea khokahano ea mahareng, joalo ka lipakeng tsa likopo tsa MySQL le lipakete tsa karabelo.
Ho utloisisa le ho boloka ts'ebetso ea lits'ebetso tsa mantlha, joalo ka Kubernetes, ke tsa bohlokoa ho ts'epahalo le lebelo la lits'ebeletso tsohle tse thehiloeng ho tsona. Sistimi e 'ngoe le e' ngoe eo u e tsamaisang e rua molemo ho ntlafatso ea ts'ebetso ea Kubernetes.
Source: www.habr.com