Ndozi nkwụsị netwọkụ na Kubernetes

Ndozi nkwụsị netwọkụ na Kubernetes

Afọ ole na ole gara aga Kubernetes atụleworị na blọọgụ GitHub gọọmentị. Kemgbe ahụ, ọ bụrụla teknụzụ ọkọlọtọ maka ibuga ọrụ. Kubernetes na-ejikwa akụkụ dị mkpa nke ọrụ ime na ọha. Ka ụyọkọ anyị na-eto eto na ihe ndị chọrọ ịrụ ọrụ na-esiwanye ike, anyị malitere ịhụ na ụfọdụ ọrụ na Kubernetes na-enwe obere oge nke na-enweghị ike ịkọwa site na ibu ngwa n'onwe ya.

N'ikpeazụ, ngwa na-enweta ohere netwọkụ na-adịghị agwụ agwụ ruo 100ms ma ọ bụ karịa, na-ebute nkwụsị oge ma ọ bụ megharịa. A tụrụ anya na ọrụ ga-enwe ike ịza arịrịọ ngwa ngwa karịa 100ms. Mana nke a agaghị ekwe omume ma ọ bụrụ na njikọ ahụ n'onwe ya na-ewe oge dị ukwuu. Iche iche, anyị hụrụ ajụjụ MySQL ngwa ngwa nke kwesịrị iwere milliseconds, MySQL mezukwara na milliseconds, mana site n'echiche nke ngwa a na-arịọ, nzaghachi ahụ were 100 ms ma ọ bụ karịa.

Ọ bịara doo anya ozugbo na nsogbu ahụ mere naanị mgbe ị na-ejikọta ọnụ na Kubernetes, ọ bụrụgodị na oku a sitere na mpụga Kubernetes. Ụzọ kachasị mfe iji mepụtaghachi nsogbu ahụ bụ na ule Vegeta, nke na-agba ọsọ site n'aka onye ọ bụla n'ime ụlọ, na-anwale ọrụ Kubernetes na ọdụ ụgbọ mmiri, ma na-edekọ oge dị elu. N'isiokwu a, anyị ga-eleba anya n'otú anyị si nwee ike ịchọta ihe kpatara nsogbu a.

Na-ewepụ mgbagwoju anya na-enweghị isi na agbụ nke na-eduga na ọdịda

Site n'iwepụta otu ihe atụ ahụ, anyị chọrọ ime ka nsogbu ahụ dịkwuo ntakịrị ma wepụ ihe mgbagwoju anya na-enweghị isi. Na mbụ, enwere ọtụtụ ihe na-asọpụta n'etiti Vegeta na pọd Kubernetes. Iji chọpụta nsogbu netwọk dị omimi, ịkwesịrị iwepụ ụfọdụ n'ime ha.

Ndozi nkwụsị netwọkụ na Kubernetes

Onye ahịa (Vegeta) na-emepụta njikọ TCP na ọnụ ọ bụla dị na ụyọkọ ahụ. Kubernetes na-arụ ọrụ dị ka netwọk machie (n'elu netwọk data dị adị) nke na-eji IPIP, ya bụ, ọ na-ekpuchi ngwugwu IP nke netwọk mkpuchi n'ime ngwugwu IP nke ebe data. Mgbe ị na-ejikọta na ọnụ ụzọ mbụ, a na-eme ntụgharị asụsụ netwọkụ Nsụgharị Adreesị Network (NAT) kwadoro ịsụgharị adreesị IP na ọdụ ụgbọ mmiri nke ọnụ Kubernetes gaa na adreesị IP na ọdụ ụgbọ mmiri na netwọk mkpuchi (kpọmkwem, pọd ya na ngwa ahụ). Maka ngwugwu na-abata, a na-eme usoro nke omume azụ. Ọ bụ usoro mgbagwoju anya nke nwere ọtụtụ steeti na ọtụtụ ihe ndị a na-emelite mgbe niile ma gbanwee ka a na-ebugharị ma na-ebugharị ọrụ.

Ịbara uru tcpdump N'ime ule Vegeta enwere igbu oge n'oge aka aka TCP (n'etiti SYN na SYN-ACK). Iji wepụ mgbagwoju anya a na-enweghị isi, ị nwere ike iji hping3 maka “pings” dị mfe nwere ngwugwu SYN. Anyị na-elele ma ọ bụrụ na ọ dị igbu oge na ngwugwu nzaghachi, wee tọgharịa njikọ ahụ. Anyị nwere ike nzacha data ahụ ka ọ bụrụ naanị ngwugwu karịrị 100ms wee nweta ụzọ dị mfe iji mepụtaghachi nsogbu karịa ule netwọk oyi akwa 7 Vegeta zuru ezu. Nke a bụ Kubernetes node "pings" na-eji TCP SYN/SYN-ACK na ọrụ "node port" (30927) na oge 10ms, na-enyocha ya site na nzaghachi ngwa ngwa:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len = 46 ip = 172.16.47.27 ttl = 59 DF id = 0 egwuregwu = 30927 ọkọlọtọ = SA seq = 1485 mmeri = 29200 rtt = 127.1 ms

len = 46 ip = 172.16.47.27 ttl = 59 DF id = 0 egwuregwu = 30927 ọkọlọtọ = SA seq = 1486 mmeri = 29200 rtt = 117.0 ms

len = 46 ip = 172.16.47.27 ttl = 59 DF id = 0 egwuregwu = 30927 ọkọlọtọ = SA seq = 1487 mmeri = 29200 rtt = 106.2 ms

len = 46 ip = 172.16.47.27 ttl = 59 DF id = 0 egwuregwu = 30927 ọkọlọtọ = SA seq = 1488 mmeri = 29200 rtt = 104.1 ms

len = 46 ip = 172.16.47.27 ttl = 59 DF id = 0 egwuregwu = 30927 ọkọlọtọ = SA seq = 5024 mmeri = 29200 rtt = 109.2 ms

len = 46 ip = 172.16.47.27 ttl = 59 DF id = 0 egwuregwu = 30927 ọkọlọtọ = SA seq = 5231 mmeri = 29200 rtt = 109.2 ms

Nwere ike ozugbo mee nchọpụta mbụ. N'ikpe ikpe site na ọnụọgụ usoro na oge, o doro anya na ndị a abụghị mkpọkọ otu oge. Oge igbu oge na-agbakọkarị ma mechaa hazie ya.

Na-esote, anyị chọrọ ịchọpụta ihe ndị nwere ike itinye aka na njedebe nke mkpọchi. Ma eleghị anya, ndị a bụ ụfọdụ n'ime narị narị iwu iptables na NAT? Ma ọ bụ enwere nsogbu ọ bụla na ọwara IPIP na netwọk? Otu ụzọ iji nwalee nke a bụ ịnwale usoro ọ bụla nke usoro ahụ site na iwepụ ya. Kedu ihe ga - eme ma ọ bụrụ na ị wepụ NAT na mgbagha ọkụ, na-ahapụ naanị akụkụ IPIP:

Ndozi nkwụsị netwọkụ na Kubernetes

Ọ dabara nke ọma, Linux na-enye gị ohere ịnweta oyi akwa mkpuchi IP ozugbo ma ọ bụrụ na igwe ahụ dị na otu netwọk ahụ:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len = 40 ip = 10.125.20.64 ttl = 64 DF id = 0 egwuregwu = 0 flags = RA seq = 7346 mmeri = 0 rtt = 127.3 ms

len = 40 ip = 10.125.20.64 ttl = 64 DF id = 0 egwuregwu = 0 flags = RA seq = 7347 mmeri = 0 rtt = 117.3 ms

len = 40 ip = 10.125.20.64 ttl = 64 DF id = 0 egwuregwu = 0 flags = RA seq = 7348 mmeri = 0 rtt = 107.2 ms

N'ikpe ikpe site na nsonaazụ ya, nsogbu ahụ ka dị! Nke a na-ewepu iptables na NAT. Ya mere, nsogbu bụ TCP? Ka anyị hụ ka ICMP ping si aga:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms

len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms

len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms

len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms

Nsonaazụ gosiri na nsogbu ahụ akwụsịbeghị. Ikekwe nke a bụ ọwara IPIP? Ka anyị mee ka ule dị mfe karị:

Ndozi nkwụsị netwọkụ na Kubernetes

A na-ezigara ngwugwu niile n'etiti ndị ọbịa abụọ a?

theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms

len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms

Anyị emeela ka ọnọdụ ahụ dị mfe na ọnụ ụzọ Kubernetes abụọ na-ezigara ibe ha ngwugwu ọ bụla, ọbụlagodi ICMP ping. Ha ka na-ahụ latency ma ọ bụrụ na onye lekwasịrị anya bụ "ọjọọ" (ụfọdụ njọ karịa ndị ọzọ).

Ugbu a ajụjụ ikpeazụ: gịnị kpatara igbu oge na-eme naanị na sava kube-node? Ọ na-eme mgbe kube-node bụ onye na-ezipụ ma ọ bụ nnata? N'ụzọ dị mma, nke a dịkwa mfe ịchọpụta site na izipu ngwugwu sitere na onye ọbịa na mpụga Kubernetes, mana yana otu nnata "mara ọjọọ". Dịka ị na-ahụ, nsogbu ahụ akwụsịbeghị:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len = 46 ip = 172.16.47.27 ttl = 61 DF id = 0 egwuregwu = 9876 flags = RA seq = 312 mmeri = 0 rtt = 108.5 ms

len = 46 ip = 172.16.47.27 ttl = 61 DF id = 0 egwuregwu = 9876 flags = RA seq = 5903 mmeri = 0 rtt = 119.4 ms

len = 46 ip = 172.16.47.27 ttl = 61 DF id = 0 egwuregwu = 9876 flags = RA seq = 6227 mmeri = 0 rtt = 139.9 ms

len = 46 ip = 172.16.47.27 ttl = 61 DF id = 0 egwuregwu = 9876 flags = RA seq = 7929 mmeri = 0 rtt = 131.2 ms

Anyị ga-agba ọsọ otu arịrịọ ahụ site na kube-node isi mmalite gaa na ndị ọbịa mpụga (nke na-ewepu onye nnabata ebe ọ bụ na ping gụnyere ma mpaghara RX na TX):

theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms

Site n'inyocha njide ngwungwu latency, anyị nwetara ozi ndị ọzọ. Kpọmkwem, onye na-ezipụ (n'ala) na-ahụ oge nkwụsị a, mana onye nnata (n'elu) anaghị - hụ kọlụm Delta (na sekọnd):

Ndozi nkwụsị netwọkụ na Kubernetes

Tụkwasị na nke a, ọ bụrụ na ị na-eleba anya na ọdịiche dị n'usoro nke ngwugwu TCP na ICMP (site na nọmba usoro) n'akụkụ ndị nnata, ngwugwu ICMP na-abịa mgbe niile n'otu usoro nke ezigara ha, ma na oge dị iche iche. N'otu oge ahụ, ngwugwu TCP na-agbakọ mgbe ụfọdụ, ụfọdụ n'ime ha na-araparakwa n'ahụ. Karịsịa, ọ bụrụ na ị na-enyocha ọdụ ụgbọ mmiri nke ngwugwu SYN, ha dị n'usoro n'akụkụ onye na-ezigara, ma ọ bụghị n'akụkụ onye nata.

Enwere ọdịiche dị nro na otu kaadị netwọk sava ọgbara ọhụrụ (dị ka ndị nọ na etiti data anyị) na-ahazi ngwugwu nwere TCP ma ọ bụ ICMP. Mgbe otu ngwugwu bịarutere, ihe nkwụnye netwọkụ ahụ "hashes ya kwa njikọ", ya bụ, ọ na-anwa imebi njikọ ahụ n'ahịrị wee ziga kwụ n'ahịrị nke ọ bụla na isi ihe nrụpụta dị iche. Maka TCP, hash a gụnyere ma isi mmalite na adreesị IP na ọdụ ụgbọ mmiri. N'ikwu ya n'ụzọ ọzọ, njikọ ọ bụla na-ekpochapụ (nwere ike) dị iche iche. Maka ICMP, ọ bụ naanị adreesị IP ka a na-agbanye, ebe ọ bụ na enweghị ọdụ ụgbọ mmiri.

Nlebanya ọhụrụ ọzọ: n'ime oge a, anyị na-ahụ ICMP igbu oge na nkwukọrịta niile n'etiti ndị ọbịa abụọ, mana TCP adịghị. Nke a na-agwa anyị na ihe kpatara ya nwere ike jikọta RX kwụ n'ahịrị hashing: mkpọchi ahụ fọrọ nke nta ka ọ bụrụ na nhazi nke ngwugwu RX, ọ bụghị na izipu nzaghachi.

Nke a na-ewepụ ngwugwu izipu site na ndepụta nke ihe nwere ike ime. Anyị maara ugbu a na nsogbu nhazi ngwugwu dị n'akụkụ nnabata na ụfọdụ sava kube-node.

Ịghọta nhazi ngwugwu na kernel Linux

Iji ghọta ihe kpatara nsogbu a ji apụta na onye nnata na ụfọdụ sava kube-node, ka anyị leba anya n'otú kernel si arụ ọrụ Linux na-ahazi ngwugwu.

N'ịlaghachi na mmejuputa omenala kachasị mfe, kaadị netwọk na-enweta ngwugwu ma zipụ kwusi isi Linuxna e nwere ngwugwu nke a ga-ahazi. Kernel ahụ na-akwụsị ọrụ ndị ọzọ, na-agbanwe ọnọdụ ya gaa na onye na-ejikwa nkwụsịtụ, na-ahazi ngwugwu ahụ, wee laghachi na ọrụ ndị dị ugbu a.

Ndozi nkwụsị netwọkụ na Kubernetes

Ngbanwe ọnọdụ a na-adị ngwa ngwa: enweghị ike ịhụta na kaadị netwọkụ 10Mbps n'ime '90s, mana na kaadị 10G ọgbara ọhụrụ nwere ntinye kacha nke nde 15 n'otu sekọnd, isi ọ bụla nke obere ihe nkesa asatọ nwere ike ịkwụsị ọtụtụ nde mmadụ. nke ugboro kwa nkeji.

Iji zere inwe nsogbu mgbe niile, ọtụtụ afọ gara aga na Linux agbakwunyere NAPI: Network API nke ndị ọkwọ ụgbọ ala ọgbara ọhụrụ na-eji kwalite arụmọrụ na oke ọsọ. Na obere ọsọ kernel ka na-enweta nkwụsị site na kaadị netwọk n'ụzọ ochie. Ozugbo ngwugwu zuru ezu ruru nke gafere ọnụ ụzọ, kernel na-akwụsị nkwụsịtụ wee malite ịtụ vootu ihe nkwụnye netwọkụ wee buru ngwugwu n'ime iberibe. A na-eme nhazi na softirq, ya bụ, in onodu nke software nkwụsị mgbe oku sistemu na ngwaike kwụsịrị, mgbe kernel (na-emegide ohere onye ọrụ) na-agba ọsọ.

Ndozi nkwụsị netwọkụ na Kubernetes

Nke a na-adị ngwa ngwa, mana ọ na-ebute nsogbu dị iche. Ọ bụrụ na enwere ọtụtụ ngwugwu, mgbe ahụ, a na-etinye oge niile na ngwugwu nhazi site na kaadị netwọk, na usoro ohere onye ọrụ enweghị oge iji kpochapụ ahịrị ndị a n'ezie (ịgụ site na njikọ TCP, wdg). N'ikpeazụ, ndị kwụ n'ahịrị na-ejupụta na anyị na-amalite idobe ngwugwu. N'ịgbalị ịchọta nguzozi, kernel na-edobe mmefu ego maka ọnụ ọgụgụ kacha elu nke ngwugwu edoziri na ọnọdụ softirq. Ozugbo emechara mmefu ego a, a na-akpọlite ​​eri dị iche ksoftirqd (ị ga-ahụ otu n'ime ha na ps kwa isi) nke na-ejikwa softirqs ndị a na-abụghị ụzọ syscall/nkwụsị nkịtị. A na-ahazi eri a site na iji usoro nhazi ọkọlọtọ, nke na-anwa ikenye akụrụngwa nke ọma.

Ndozi nkwụsị netwọkụ na Kubernetes

N'ịmụta ka kernel na-esi ahazi ngwugwu, ị ga-ahụ na enwere ike inwe mkpọchi. Ọ bụrụ na anata obere oku softirq, ngwugwu ga-echere ruo oge ụfọdụ ka ahazi ya na kwụ n'ahịrị RX na kaadị netwọkụ. Nke a nwere ike ịbụ n'ihi ụfọdụ ọrụ na-egbochi isi ihe nrụpụta, ma ọ bụ ihe ọzọ na-egbochi isi ka ọ na-agba softirq.

Na-ebelata nhazi ahụ ruo na isi ma ọ bụ usoro

Softirq igbu oge bụ naanị ntule maka ugbu a. Ma ọ bụ ihe ezi uche dị na ya, anyị makwaara na anyị na-ahụ ihe yiri nke ahụ. Ya mere nzọụkwụ ọzọ bụ ịkwado echiche a. Ma ọ bụrụ na a kwadoro ya, wee chọpụta ihe kpatara igbu oge.

Ka anyị laghachi na ngwugwu anyị nwayọ:

len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms

len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms

len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms

len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms

Dịka anyị kwuru na mbụ, a na-etinye ngwugwu ICMP ndị a n'otu usoro NIC RX ma hazie ha site na otu isi CPU. Ọ bụrụ na anyị chọrọ ịghọta ọrụ ahụ, anyị ga-aghọta otú e si arụ ọrụ ahụ. Linux, ọ bara uru ịmara ebe (isi CPU nke dị) na otu esi ahazi ngwugwu ndị a iji chọpụta usoro ahụ.

Ugbua oge eruola iji ngwaọrụ ndị na-enye gị ohere inyocha arụmọrụ kernel n'oge. LinuxEbe a ka anyị jiri mee ihe Bcc. Ngwa ngwaọrụ a na-enye gị ohere ide obere mmemme C nke na-ejikọta ọrụ aka ike na kernel ma chekwaa ihe omume n'ime mmemme Python-space nke nwere ike hazie ha wee weghachi gị nsonaazụ ya. Ịkwado ọrụ aka ike na kernel bụ ihe dị mgbagwoju anya, mana emebere ngwa ahụ maka nchekwa kachasị ma emebere ya iji nyochaa kpọmkwem ụdị mmepụta ihe na-adịghị mfe imepụtaghachi na gburugburu ule ma ọ bụ mmepe.

Atụmatụ ebe a dị mfe: anyị maara na kernel na-ahazi pings ICMP ndị a, yabụ anyị ga-etinye nko na ọrụ kernel. icmp_echo, nke na-anabata ngwugwu arịrịọ echo ICMP na-abata wee malite izipu nzaghachi echo ICMP. Anyị nwere ike ịchọpụta otu ngwugwu site na ịba ụba nọmba icmp_seq, nke na-egosi hping3 elu.

Usoro bcc edemede yiri mgbagwoju anya, ma ọ bụghị egwu dị ka ọ dị. Ọrụ icmp_echo na-ebufe struct sk_buff *skb: Nke a bụ ngwugwu nwere "arịrịọ echo". Anyị nwere ike soro ya, dọpụta usoro echo.sequence (nke tụnyere icmp_seq site hping3 выше), ma ziga ya na oghere onye ọrụ. Ọ dịkwa mma ijide aha/ id usoro dị ugbu a. N'okpuru bụ nsonaazụ anyị na-ahụ ozugbo mgbe kernel na-eme ngwugwu:

Usoro TGID PID ICMP_seq 0 0 Swapper/11 770 0 0 SwApper/11 771 0 0 Swapper/11 772 0 0 Swapper/11 773 0 0 Swapper/11 774 20041 ROM 20086 775 SWAPPER/ 0 0 11 776 swapper/0 0 11 777 ọnụ na-ekwuchitere-s 0

Okwesiri iburu n'uche ebe a na onodu softirq Usoro ndị mere oku sistemụ ga-apụta dị ka "usoro" mgbe n'ezie ọ bụ kernel na-edozi ngwugwu n'enweghị nsogbu na ọnọdụ nke kernel.

Na ngwá ọrụ a anyị nwere ike na-akpakọrịta kpọmkwem Filiks na kpọmkwem ngwugwu na-egosi a igbu oge nke hping3. Ka anyị mee ka ọ dị mfe grep na njide a maka ụkpụrụ ụfọdụ icmp_seq. Edebere ngwugwu dabara na ụkpụrụ icmp_seq dị n'elu yana RTT ha anyị hụrụ n'elu (na akara aka bụ ụkpụrụ RTT a na-atụ anya maka ngwugwu anyị wepụrụ n'ihi ụkpụrụ RTT na-erughị 50 ms):

Usoro TGID PID ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 ksoftirqd11 1954 89 ** 76ms 76 11 ksoftirqd/ 1955 79 ** 76ms 76 11 ksoftirqd/1956 69 ** 76ms 76 11 ksoftirqd/1957 59 ** (76ms) 76 11 ksoftirqd/1958 49 ** (76ms) 76 11 ksoftirqd1959 39 76 ksoftirqd76. qd/ 11 1960 ** (29ms) 76 76 ksoftirqd/11 1961 ** (19ms) -- 76 76 cadvisor 11 1962 9 cadvisor 10137 10436 2068 ksoftirqd/10137 10436 ** 2069ms 76 76 ksoftirqd/ 11 2070 ** 75ms 76 76 ksoftirqd/11 2071 ** (65ms) 76 76 ksoftirqd/11 2072 ** (55ms) 76 76 ksoftirqd/11 2073 ** (45ms) 76 ksoftirqd 76 11 ** (2074ms) ) 35 76 ksoftirqd/76 11 ** (2075ms)

Nsonaazụ na-agwa anyị ọtụtụ ihe. Nke mbụ, a na-ahazi ngwugwu ndị a niile site na gburugburu ksoftirqd/11. Nke a pụtara na maka igwe igwe abụọ a, ngwugwu ICMP agbagoro na isi 11 na njedebe nnata. Anyị na-ahụkwa na mgbe ọ bụla enwere jam, enwere ngwugwu ndị a na-edozi n'ọnọdụ nke oku usoro cadvisor... Mgbe ahụ ksoftirqd na-eweghara ọrụ ahụ ma na-edozi kwụ n'ahịrị chịkọbara: kpọmkwem ọnụ ọgụgụ nke ngwugwu ndị chịkọtara mgbe e mesịrị cadvisor.

Eziokwu ahụ ozugbo tupu ọ na-arụ ọrụ mgbe niile cadvisor, na-egosi itinye aka ya na nsogbu ahụ. N'ụzọ na-emegide onwe ya, ebumnuche onye na-agụ akwụkwọ - "nyochaa ojiji akụrụngwa yana njirimara arụmọrụ nke arịa na-agba ọsọ" kama ịkpata nsogbu arụmọrụ a.

Dị ka akụkụ ndị ọzọ nke arịa, ndị a niile bụ ngwá ọrụ dị elu nke ukwuu na enwere ike ịtụ anya na ha ga-enweta nsogbu arụmọrụ n'okpuru ọnọdụ ụfọdụ a na-atụghị anya ya.

Kedu ihe cadvisor na-eme nke na-ebelata kwụ n'ahịrị ngwugwu?

Ugbua, anyị ghọtara nke ọma otu ihe mberede ahụ si eme, usoro ọ na-akpata ya, na CPU ọ na-akpata ya. Anyị na-ahụ na n'ihi mkpọchi siri ike, kernel ahụ dị n'ime ya. Linux enweghị oge ime atụmatụ n'oge ksoftirqd. Anyị na-ahụkwa na a na-ahazi ngwugwu na ọnọdụ cadvisor. Ọ bụ ihe ezi uche dị na ya iche nke ahụ cadvisor na-amalite syscall nwayọ, mgbe nke ahụ gasịrị, a na-ahazi ngwugwu niile agbakọbara n'oge ahụ:

Ndozi nkwụsị netwọkụ na Kubernetes

Nke a bụ tiori, mana ka esi nwalee ya? Ihe anyị nwere ike ime bụ ịchọpụta isi CPU n'oge usoro a niile, chọta ebe ọnụọgụ nke ngwugwu gafere mmefu ego na a na-akpọ ksoftirqd, wee lebakwuo anya azụ ka ịhụ ihe na-agba ọsọ na isi CPU tupu oge ahụ. . Ọ dị ka ị na-eme CPU x-ray kwa millisecond ole na ole. Ọ ga-adị ka nke a:

Ndozi nkwụsị netwọkụ na Kubernetes

N'ụzọ dị mma, enwere ike iji ngwaọrụ ndị dị ugbu a mee ihe a niile. Ọmụmaatụ, ndekọ perf Lelee isi CPU enyere na ugboro ole a kapịrị ọnụ ma nwee ike ịmepụta eserese oku nke sistemụ na-agba ọsọ, gụnyere ma oghere onye ọrụ na kernel LinuxI nwere ike were ndekọ a ma hazie ya site na iji obere fọk nke mmemme ahụ. Eserese ọkụ sitere na Brendan Gregg, nke na-echekwa usoro nke nchịkọta nchịkọta. Anyị nwere ike chekwaa traktị otu ahịrị n'otu ms ọ bụla, wee pụta ìhè ma chekwaa ihe nlele 1 milliseconds tupu nchọta ahụ erute. ksoftirqd:

# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100

Nke a bụ nsonaazụ:

(сотни следов, которые выглядят похожими)

cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run

Enwere ọtụtụ ihe ebe a, mana isi ihe bụ na anyị na-ahụ ụkpụrụ “cadvisor before ksoftirqd” nke anyị hụrụ na mbụ na tracer ICMP. Kedu ihe ọ pụtara?

Ahịrị ọ bụla bụ akara CPU n'otu oge n'oge. A na-ekewa oku ọ bụla na-agbada n'elu ikpo okwu n'ahịrị site na otu ọkara. N'etiti ahịrị ndị ahụ anyị na-ahụ syscall ka a na-akpọ: read(): .... ;do_syscall_64;sys_read; .... Yabụ na cadvisor na-etinye oge dị ukwuu na oku sistemụ read()metụtara ọrụ mem_cgroup_* (n'elu nchịkọta oku/ngwụcha ahịrị).

Ọ dịghị mfe ịhụ na nchọta oku ihe kpọmkwem a na-agụ, ya mere ka anyị gbaa ọsọ strace ka anyị hụ ihe cadvisor na-eme wee chọta oku sistemụ ogologo karịa 100 ms:

theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>

Dịka ị nwere ike ịtụ anya, anyị na-ahụ oku nwayọ ebe a read(). Site na ọdịnaya nke arụmọrụ na-agụ na ọnọdụ mem_cgroup o doro anya na ihe ịma aka ndị a read() rụtụ aka na faịlụ ahụ memory.stat, nke na-egosi ojiji ebe nchekwa yana oke mkpokọta (teknụzụ kewapụ akụrụngwa Docker). Ngwa cadvisor na-ajụ faịlụ a iji nweta ozi ojiji akụrụngwa maka arịa. Ka anyị lelee ma ọ bụ kernel ma ọ bụ cadvisor na-eme ihe a na-atụghị anya ya:

theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null

ezigbo 0m0.153s
onye ọrụ 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $

Ugbu a, anyị nwere ike ịmụpụta ahụhụ ahụ ma ghọta na kernel ahụ Linux na-enwe nsogbu na pathology.

Kedu ihe kpatara ọrụ ọgụgụ ji eji nwayọ nwayọ?

N'oge a, ọ dị mfe ịchọta ozi sitere na ndị ọrụ ndị ọzọ gbasara nsogbu ndị yiri ya. Dịka ọ siri pụta, n'ime cadvisor tracker ka akọọrọ ahụhụ a dị ka nsogbu nke oke ojiji CPU, ọ bụ naanị na ọ dịghị onye chọpụtara na latency na-egosipụtakwa na-enweghị usoro na nchịkọta netwọk. Achọpụtara n'ezie na cadvisor na-eri oge CPU karịa ka a tụrụ anya ya, mana nke a enyeghị ya dị oke mkpa, ebe ọ bụ na sava anyị nwere ọtụtụ ihe CPU, ya mere ejighị nlezianya mụọ nsogbu ahụ.

Nsogbu a bụ na otu dị iche iche na-eburu n'uche ojiji ebe nchekwa dị n'ime oghere aha (akpa). Mgbe usoro niile dị na ọpụpụ otu a, Docker na-ahapụ otu ebe nchekwa. Agbanyeghị, "ncheta" abụghị naanị nhazi ebe nchekwa. Ọ bụ ezie na anaghịzi eji ebe nchekwa usoro ahụ n'onwe ya, ọ na-egosi na kernel ka na-ekenye ọdịnaya echekwara, dị ka dentries na inodes (akwụkwọ ndekọ na metadata faịlụ), nke echekwara na otu ebe nchekwa. Site na nkọwa nsogbu:

Otu egwuregwu zombie: otu ndị na-enweghị usoro na ehichapụrụ, mana ka nwere ebe nchekwa ekenyela (n'ọnọdụ m, site na oghere dentry, mana enwere ike kenye ya na cache ibe ma ọ bụ tmpfs).

Nlele kernel nke ibe niile dị na cache mgbe ị na-ahapụ otu nwere ike ịdị ngwa ngwa, yabụ a na-ahọrọ usoro ume ume: chere ruo mgbe achọrọ ibe akwụkwọ ndị a ọzọ, wee mechaa kpochapụ cgroup mgbe achọrọ ebe nchekwa ahụ n'ezie. Ruo oge a, a ka na-eburu n'uche cgroup mgbe a na-anakọta ọnụ ọgụgụ.

Site n'echiche arụmọrụ, ha chụrụ ebe nchekwa maka ịrụ ọrụ: na-eme ka mkpochapụ mbụ dị ngwa site na ịhapụ ụfọdụ ebe nchekwa echekwara. Nke a dị mma. Mgbe kernel na-eji ikpeazụ nke ebe nchekwa echekwara, a na-ekpochapụ otu ahụ, yabụ enweghị ike ịkpọ ya "leak". N'ụzọ dị mwute, kpọmkwem mmejuputa iwu nke search usoro memory.stat na ụdị kernel a (4.9), jikọtara ya na nnukwu ebe nchekwa dị na sava anyị, pụtara na ọ na-ewe ogologo oge iji weghachi data cache kachasị ọhụrụ na kpochapụ zombies otu.

Ọ tụgharịrị na ụfọdụ n'ime ọnụ anyị nwere ọtụtụ zombies otu nke na ọgụgụ na latency gafere otu sekọnd.

Ihe na-arụ ọrụ maka nsogbu cadvisor bụ ozugbo free dentries/inodes cache na sistemụ niile, nke na-ewepụ ngwa ngwa ịgụ akwụkwọ yana latency netwọk na onye ọbịa, ebe ọ bụ na ikpochapụ cache na-agbanye ibe akwụkwọ cgroup zombie echekwara wee tọhapụ ha. Nke a abụghị ngwọta, ma ọ na-akwado ihe kpatara nsogbu ahụ.

Ọ tụgharịrị na n'ụdị kernel ọhụrụ (4.19+) ka emelitere arụmọrụ oku memory.stat, yabụ ịtụgharị na kernel a doziri nsogbu ahụ. N'otu oge ahụ, anyị nwere ngwaọrụ iji chọpụta ọnụ ọnụ nwere nsogbu na ụyọkọ Kubernetes, jiri amara kpochapụ ha wee malitegharịa ha. Anyị chịkọtara ụyọkọ niile, chọta ọnụ ọnụ nwere nnukwu latency zuru oke wee malitegharịa ha. Nke a nyere anyị oge imelite OS na sava ndị fọdụrụ.

Iji chịkọta

N'ihi na ahụhụ a kwụsịrị nhazi RX NIC kwụ n'ahịrị maka ọtụtụ narị milliseconds, n'otu oge kpatara ma nnukwu latency na njikọ dị mkpirikpi yana nkwụsị njikọ etiti, dị ka n'etiti arịrịọ MySQL na ngwugwu nzaghachi.

Ịghọta na ịnọgide na-arụ ọrụ nke usoro kachasị mkpa, dị ka Kubernetes, dị oké mkpa maka ntụkwasị obi na ọsọ nke ọrụ niile dabere na ha. Sistemu ọ bụla ị na-arụ na-erite uru site na nkwalite arụmọrụ Kubernetes.

isi: www.habr.com

Zụta nnabata ntụkwasị obi maka saịtị nwere nchekwa DDoS, sava VPS VDS 🔥 Zụta ebe nrụọrụ weebụ a pụrụ ịtụkwasị obi na nchekwa DDoS, sava VPS VDS | ProHoster