Debugging network latency ho Kubernetes

Debugging network latency ho Kubernetes

Lilemong tse 'maloa tse fetileng Kubernetes e seng e tšohliloe ho blog ea GitHub ea semmuso. Ho tloha ka nako eo, e fetohile theknoloji e tloaelehileng ea ho tsamaisa litšebeletso. Hona joale Kubernetes o laola karolo e kholo ea litšebeletso tsa ka hare le tsa sechaba. Ha lihlopha tsa rona li ntse li hola 'me litlhoko tsa ts'ebetso li ntse li thatafala le ho feta, re ile ra qala ho hlokomela hore lits'ebeletso tse ling ho Kubernetes li ne li ba le latency ka linako tse ling e neng e ke ke ea hlalosoa ke mojaro oa ts'ebeliso ka boeona.

Ha e le hantle, lits'ebetso li na le latency ea marang-rang e fihlang ho 100ms kapa ho feta, e bakang hore nako e felile kapa e leka hape. Litšebeletso li ne li lebelletsoe ho khona ho araba likopo kapele ho feta 100ms. Empa sena ha se khonehe haeba khokahano ka boeona e nka nako e ngata haholo. Ka thoko, re bone lipotso tse potlakileng tsa MySQL tse lokelang ho nka milliseconds, mme MySQL e phethile ka milliseconds, empa ho latela pono ea kopo ea kopo, karabo e nkile 100 ms kapa ho feta.

Hang-hang ho ile ha hlaka hore bothata bo etsahetse feela ha o hokela node ea Kubernetes, leha mohala o tsoa kantle ho Kubernetes. Tsela e bonolo ka ho fetisisa ea ho hlahisa bothata ke tekong Vegeta, e tsoang ho moamoheli leha e le ofe oa ka hare, e leka tšebeletso ea Kubernetes boema-kepeng bo itseng, 'me ka linako tse ling e ngolisa ho lieha ho hoholo. Sehloohong sena, re tla sheba hore na re khonne ho fumana sesosa sa bothata bona joang.

Ho felisa ho rarahana ho sa hlokahaleng ka ketane ho isang ho hloleheng

Ka ho hlahisa mohlala o tšoanang, re ne re batla ho fokotsa sepheo sa bothata le ho tlosa likarolo tse sa hlokahaleng tsa ho rarahana. Qalong, ho ne ho e-na le likarolo tse ngata haholo phallo pakeng tsa Vegeta le Kubernetes pods. Ho tseba bothata bo tebileng ba marang-rang, u lokela ho laola tse ling tsa tsona.

Debugging network latency ho Kubernetes

Moreki (Vegeta) o theha khokahano ea TCP le node efe kapa efe sehlopheng. Kubernetes e sebetsa e le marang-rang a holimo (ka holim'a marang-rang a teng a setsi sa data) a sebelisang IPIP, ke hore, e kenyelletsa lipakete tsa IP tsa marang-rang a holimo ka har'a lipakete tsa IP tsa setsi sa data. Ha o hokela node ea pele, phetolelo ea aterese ea marang-rang e etsoa Phetolelo ea Aterese ea Marangrang (NAT) e ikemiselitse ho fetolela aterese ea IP le boema-kepe ba node ea Kubernetes ho aterese ea IP le boema-kepe ba marang-rang (haholo-holo, pod e nang le ts'ebeliso). Bakeng sa lipakete tse kenang, tatellano e ka morao ea liketso e etsoa. Ke tsamaiso e rarahaneng e nang le maemo a mangata le likarolo tse ngata tse lulang li nchafatsoa le ho fetoloa ha litšebeletso li tsamaisoa le ho tsamaisoa.

Tšebeliso tcpdump tekong ea Vegeta ho na le ho lieha nakong ea ho ts'oarana ka letsoho TCP (pakeng tsa SYN le SYN-ACK). Ho tlosa bothata bona bo sa hlokahaleng, o ka sebelisa hping3 bakeng sa "pings" e bonolo ka lipakete tsa SYN. Re hlahloba hore na ho na le tieho paketeng ea karabo, ebe re tsosolosa khokahanyo. Re ka sefa datha ho kenyelletsa feela lipakete tse kholo ho feta 100ms le ho fumana tsela e bonolo ea ho hlahisa bothata ho feta tlhahlobo e felletseng ea marang-rang ea Vegeta. Mona ke "pings" ea Kubernetes e sebelisang TCP SYN/SYN-ACK ho "node port" ea ts'ebeletso (7) ka linako tse 30927ms, e tlhotliloeng ka likarabo tse liehang haholo:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms

Ka hang-hang ho etsa tlhokomeliso ea pele. Ho latela lipalo tsa tatellano le linako, ho hlakile hore tsena ha se tšubuhlellano ea nako e le 'ngoe. Hangata ho lieha hoa bokellana 'me qetellong ho sebetsoa.

Ka mor'a moo, re batla ho fumana hore na ke likarolo life tse ka amehang ketsahalong ea tšubuhlellano. Mohlomong tsena ke tse ling tsa makholo a melao ea iptables ho NAT? Kapa na ho na le mathata leha e le afe ka IPIP tunneling marang-rang? Tsela e 'ngoe ea ho hlahloba sena ke ho hlahloba mohato o mong le o mong oa tsamaiso ka ho e felisa. Ho etsahalang ha o tlosa NAT le logic ea firewall, o siea karolo ea IPIP feela:

Debugging network latency ho Kubernetes

Ka lehlohonolo, Linux e etsa hore ho be bonolo ho fihlella lera le koaheletsoeng la IP ka kotloloho haeba mochini o le marang-rang a tšoanang:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms

Ho latela liphello, bothata bo ntse bo le teng! Sena ha se kenyelle li-iptables le NAT. Joale bothata ke TCP? Ha re boneng hore na ping e tloaelehileng ea ICMP e ea joang:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms

len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms

len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms

len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms

Liphello li bontša hore bothata ha boa fela. Mohlomong ena ke kotopo ea IPIP? Ha re nolofatse tlhahlobo ho feta:

Debugging network latency ho Kubernetes

Na lipakete tsohle li romelloa lipakeng tsa batho baa ba babeli?

theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms

len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms

Re nolofalitse boemo ho li-node tse peli tsa Kubernetes tse romellanang pakete efe kapa efe, esita le ping ea ICMP. Ba ntse ba bona latency haeba moamoheli a le "mpe" (ba bang ba mpe ho feta ba bang).

Joale potso ea ho qetela: hobaneng tieho e etsahala feela ho li-server tsa kube-node? Hona na ho etsahala ha kube-node e le moromeli kapa moamoheli? Ka lehlohonolo, ho bonolo ho utloisisa sena ka ho romella pakete ho tsoa ho moamoheli ea kantle ho Kubernetes, empa ka moamoheli ea tšoanang "ea tsejoang hampe". Joalokaha u ka bona, bothata ha boa fela:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms

Joale re tla tsamaisa likopo tse tšoanang ho tsoa mohloling o fetileng oa kube-node ho moamoheli oa kantle (e sa kenyelletseng mohloli oa mohloli kaha ping e kenyelletsa karolo ea RX le TX ka bobeli):

theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms

Ka ho hlahloba lipakete tsa latency, re fumane lintlha tse ling. Ka ho khetheha, hore motho ea romelang (ka tlaase) o bona nako ena e felile, empa moamoheli (ka holimo) ha a bone - bona kholomo ea Delta (ka metsotsoana):

Debugging network latency ho Kubernetes

Ho phaella moo, haeba u sheba phapang ka tatellano ea lipakete tsa TCP le ICMP (ka linomoro tsa tatellano) ka lehlakoreng la moamoheli, lipakete tsa ICMP li lula li fihla ka tatellano e tšoanang eo li rometsoeng ka eona, empa ka nako e fapaneng. Ka nako e ts'oanang, lipakete tsa TCP ka linako tse ling lia kena-kenana, 'me tse ling tsa tsona lia khomarela. Haholo-holo, haeba u hlahloba likou tsa lipakete tsa SYN, li hlophisitsoe ka lehlakoreng la moromeli, empa eseng ka lehlakoreng la moamoheli.

Ho na le phapang e poteletseng ea hore na joang likarete tsa marang-rang li-server tsa sejoale-joale (joalo ka tse setsing sa rona sa data) li sebetsa lipakete tse nang le TCP kapa ICMP. Ha pakete e fihla, adaptara ea marang-rang e "hashes ka khokahanyo", ke hore, e leka ho senya likhokahano ka mela le ho romela letoto le leng le le leng ho mantlha ea processor e arohaneng. Bakeng sa TCP, hash ena e kenyelletsa aterese ea IP ea mohloli le moo e eang teng le boema-kepe. Ka mantsoe a mang, khokahano ka 'ngoe e potlakile (mohlomong) ka tsela e fapaneng. Bakeng sa ICMP, ke liaterese tsa IP feela tse hashed, kaha ha ho na likou.

Tlhokomeliso e 'ngoe e ncha: nakong ena re bona ho lieha ha ICMP lipuisanong tsohle pakeng tsa mabotho a mabeli, empa TCP ha e etse joalo. Sena se re bolella hore sesosa se ka 'na sa amana le RX queue hashing: tšubuhlellano e batla e le ts'ebetsong ea lipakete tsa RX, eseng ho romela likarabo.

Sena se felisa ho romela lipakete ho tsoa lethathamong la lisosa tse ka bang teng. Joale rea tseba hore bothata ba ho sebetsana le lipakete bo ka lehlakoreng la ho amohela ho li-server tse ling tsa kube-node.

Ho utloisisa ts'ebetso ea lipakete ho Linux kernel

Ho utloisisa hore na hobaneng bothata bo hlaha ho moamoheli ho li-server tse ling tsa kube-node, ha re shebeng hore na Linux kernel e sebetsa joang lipakete.

Ho khutlela ts'ebetsong e bonolo ka ho fetisisa ea setso, karete ea marang-rang e amohela pakete ebe e romela sitisa Linux kernel hore ho na le sephutheloana se hlokang ho sebetsoa. Kernel e emisa mosebetsi o mong, e fetola moelelo oa taba ho sesebelisoa se sitisang, e sebetsana le pakete, ebe e khutlela mesebetsing ea hajoale.

Debugging network latency ho Kubernetes

Phetoho ea moelelo oa taba e ea liehang: ho ka etsahala hore ebe latency e ne e sa bonahale ho likarete tsa marang-rang tsa 10Mbps lilemong tsa bo-90, empa likareteng tsa sejoale-joale tsa 10G tse nang le palo e kholo ea lipakete tse limilione tse 15 motsotsoana, mokokotlo o mong le o mong oa seva e nyane ea mantlha e robeli e ka sitisoa ke limilione. ea linako ka motsotsoana.

E le hore u se ke ua lula u sebetsana le litšitiso, lilemong tse ngata tse fetileng Linux e ekelitse NAPI: Network API eo bakhanni bohle ba sejoale-joale ba e sebelisang ho ntlafatsa ts'ebetso ka lebelo le holimo. Ka lebelo le tlase kernel e ntse e fumana litšitiso ho tloha karete ea marang-rang ka tsela ea khale. Hang ha lipakete tse lekaneng li fihla ho feta moeli, kernel e thibela ho sitisa 'me ho e-na le hoo e qala ho khetha adaptara ea marang-rang le ho nka lipakete ka likotoana. Ts'ebetso e etsoa ka softirq, ke hore, in boemo ba software bo sitisa ka mor'a hore mehala ea tsamaiso le hardware e sitisoe, ha kernel (ho fapana le sebaka sa mosebedisi) e se e ntse e sebetsa.

Debugging network latency ho Kubernetes

Sena se potlakile haholo, empa se baka bothata bo fapaneng. Haeba ho na le lipakete tse ngata haholo, joale nako eohle e sebelisoa ho sebetsana le lipakete ho tloha kareteng ea marang-rang, 'me mekhoa ea sebaka sa mosebedisi ha e na nako ea ho tlosa li-queue tsena (ho bala ho tloha ho li-connections tsa TCP, joalo-joalo). Qetellong mela e tlala 'me re qala ho lahlela lipakete. E le ho leka ho fumana tekanyo, kernel e beha tekanyetso bakeng sa palo e kholo ea lipakete tse entsoeng ka mokhoa oa softirq. Hang ha tekanyetso ena e fetisitsoe, ho tsosoa khoele e arohaneng ksoftirqd (o tla bona e 'ngoe ea tsona ps per core) e sebetsanang le li-softirq tsena ka ntle ho tsela e tloaelehileng ea syscall/interrupt. Khoele ena e hlophisitsoe ho sebelisoa kemiso ea tšebetso e tloaelehileng, e lekang ho aba lisebelisoa ka toka.

Debugging network latency ho Kubernetes

Ha u se u ithutile hore na kernel e sebetsa joang lipaketeng, u ka bona hore ho na le monyetla o itseng oa tšubuhlellano. Haeba li-call tsa softirq li amoheloa khafetsa, liphutheloana li tla tlameha ho ema nako e itseng ho sebetsa moleng oa RX kareteng ea marang-rang. Sena se kanna sa bakoa ke mosebetsi o itseng o thibelang processor ea mantlha, kapa ho hong ho thibelang mantlha ho sebetsa softirq.

Ho fokotsa ts'ebetso ho fihla bohareng kapa mokgoa

Ho lieha ha Softirq ke khakanyo feela hajoale. Empa hoa utloahala, 'me rea tseba hore re bona ntho e tšoanang haholo. Kahoo mohato o latelang ke ho tiisa khopolo ena. 'Me haeba e tiisitsoe, joale fumana lebaka la tieho.

Ha re khutleleng lipaketeng tsa rona tse liehang:

len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms

len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms

len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms

len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms

Joalokaha ho boletsoe pejana, lipakete tsena tsa ICMP li potlakisetsoa moleng o le mong oa RX NIC 'me li sebetsoa ke motheo o le mong oa CPU. Haeba re batla ho utloisisa hore na Linux e sebetsa joang, ho molemo ho tseba hore na (e leng CPU core) le hore na (softirq, ksoftirqd) liphutheloana tsena li sebetsoa joang molemong oa ho latela ts'ebetso.

Joale ke nako ea ho sebelisa lisebelisoa tse u lumellang ho beha leihlo Linux kernel ka nako ea nnete. Mona re sebelitse bcc. Sehlopha sena sa lisebelisoa se u lumella ho ngola mananeo a manyane a C a kopanyang mesebetsi e sa reroang ka har'a kernel ebe o beha liketsahalo lenaneong la Python la mosebelisi le ka li sebetsanang le ho khutlisetsa sephetho ho uena. Ho ts'oara ts'ebetso e sa reroang ka har'a kernel ke taba e rarahaneng, empa ts'ebeliso e etselitsoe ts'ireletso e kholo mme e etselitsoe ho latela mofuta oa litaba tsa tlhahiso tse sa hlahisoeng habonolo tikolohong ea liteko kapa nts'etsopele.

Morero mona o bonolo: rea tseba hore kernel e sebetsana le li-pings tsena tsa ICMP, kahoo re tla kenya hook mosebetsing oa kernel. icmp_echo, e amohelang pakete e kenang ea kopo ea ICMP echo mme e qala ho romella karabo ea ICMP echo. Re ka khetholla pakete ka ho eketsa nomoro ea icmp_seq, e bonts'ang hping3 phahameng.

khoutu mongolo oa bcc e shebahala e rarahane, empa ha e tšabe joalo ka ha e bonahala. Mosebetsi icmp_echo fetisa struct sk_buff *skb: Ena ke pakete e nang le "kopo ea echo". Re ka e latela, ra ntša tatellano echo.sequence (e bapisoang le icmp_seq ka hping3 выше), ebe o e romela sebakeng sa mosebedisi. Hape ho bonolo ho hapa lebitso/id ea ts'ebetso ea hajoale. Ka tlase ke liphetho tseo re li bonang ka kotloloho ha kernel e sebetsa lipakete:

TGID PID Movie The ImpMmp_Seq 0 0 Sekwap / 11 770 0 0 11 771 0 swapper/0 11 772 0 spokes-report-s 0

Ho ke ho hlokomeloe mona hore moelelong oa taba softirq lits'ebetso tse entseng mehala ea sistimi li tla hlaha e le "mekhoa" ha ha e le hantle e le kernel e sebetsang ka mokhoa o sireletsehileng lipakete maemong a kernel.

Ka sesebelisoa sena re ka amahanya lits'ebetso tse ikhethileng le liphutheloana tse ikhethileng tse bonts'ang tieho ea hping3. Ha re e nolofatseng grep ka ho hapa sena bakeng sa litekanyetso tse itseng icmp_seq. Lipakete tse tsamaellanang le boleng ba icmp_seq tse kaholimo li hlokometsoe hammoho le RTT ea bona eo re e boneng kaholimo (ka masakaneng ke litekanyetso tse lebelletsoeng tsa RTT bakeng sa lipakete tseo re li hloekisitseng ka lebaka la boleng ba RTT bo ka tlase ho 50 ms):

TGID PID PROCES NAME ICMP_SEQ ** RTT -- 10137 10436 cadvisor 1951 10137 10436 cadvisor 1952 76 76 ksoftirqd/11 1953 ** 99ms 76 76 11 ksoft 1954 89 ksoft 76 76 ksoft 11 ir qd/1955 79 ** 76ms 76 11 ksoftirqd/ 1956 69 ** 76ms 76 11 ksoftirqd/1957 59 ** 76ms 76 11 ksoftirqd/1958 49 ** (76ms) 76 11 ksoftirqd/1959 39 ** (76ms) 76 11 1960 ksoftirqd (29ms) 76 76 11 k ksoft irqd/ 1961 19 ** (76ms) 76 11 ksoftirqd/1962 9 ** (10137ms) -- 10436 2068 cadvisor 10137 10436 2069 cadvisor 76 76 11 2070 75 ksoft 76 ksoft 76 irqd/11 2071 ** 65ms 76 76 ksoftirqd/ 11 2072 ** 55ms 76 76 ksoftirqd/11 2073 ** (45ms) 76 76 ksoftirqd/11 2074 ** (35ms) 76 76 ksoftirqd/11 2075 25 ** 76 76 11 ** 2076 15 ksoftirqd/ 76 76 ksoftirqd ms ) 11 2077 ksoftirqd/5 XNUMX ** (XNUMXms)

Liphetho li re bolella lintho tse 'maloa. Taba ea pele, liphutheloana tsena tsohle li sebetsoa ka moelelo oa taba ksoftirqd/11. Sena se bolela hore bakeng sa mochini ona o khethehileng, lipakete tsa ICMP li ile tsa potlakisetsoa ho 11 qetellong ea ho amohela. Hape rea bona hore neng kapa neng ha ho na le jeme, ho na le lipakete tse sebetsoang molemong oa mohala oa sistimi. cadvisor... Joale ksoftirqd o nka mosebetsi mme o tsamaisa letoto le bokelletsoeng: hantle palo ea lipakete tse bokelletsoeng kamora moo. cadvisor.

Taba ea hore hang-hang pele e sebetsa kamehla cadvisor, ho bolela ho ameha ha hae bothateng boo. Ho makatsang ke hore morero cadvisor - "hlahlobisisa tšebeliso ea lisebelisoa le litšobotsi tsa ts'ebetso ea lijana tse tsamaisang" ho fapana le ho baka bothata bona ba ts'ebetso.

Joalo ka likarolo tse ling tsa lijana, tsena kaofela ke lisebelisoa tse tsoetseng pele haholo 'me ho ka lebelloa ho ba le mathata a ts'ebetso tlasa maemo a sa lebelloang.

Cadvisor e etsa eng e liehisang mokoloko oa lipakete?

Hona joale re na le kutloisiso e ntle ea hore na ho oa ho etsahala joang, ke ts'ebetso efe e e bakang, le hore na ke CPU efe. Rea bona hore ka lebaka la ho thibela ka thata, kernel ea Linux ha e na nako ea ho hlophisa ksoftirqd. 'Me rea bona hore lipakete li sebetsoa ka moelelo cadvisor. Hoa utloahala ho nahana joalo cadvisor e qala syscall butle, ka mor'a moo lipakete tsohle tse bokelletsoeng ka nako eo li sebetsoa:

Debugging network latency ho Kubernetes

Ena ke khopolo, empa mokhoa oa ho e leka? Seo re ka se etsang ke ho ts'oara mokokotlo oa CPU ho pholletsa le ts'ebetso ena, fumana ntlha eo palo ea lipakete e fetang tekanyetso ea lichelete 'me ksoftirqd e bitsoa, ​​​​'me u shebe morao ho feta ho bona hore na hantle-ntle ho ne ho ntse ho sebetsa joang motheong oa CPU pele ho ntlha eo. . Ho tšoana le x-raying CPU ka metsotsoana e meng le e meng e seng mekae. E tla shebahala tjena:

Debugging network latency ho Kubernetes

Ka mokhoa o bonolo, sena sohle se ka etsoa ka lisebelisoa tse teng. Ka mohlala, perf record e lekola konokono e fanoeng ea CPU ka lebelo le boletsoeng mme e ka hlahisa kemiso ea mehala ho sistimi e sebetsang, ho kenyeletsoa sebaka sa mosebelisi le kernel ea Linux. U ka nka tlaleho ena 'me ua e sebetsa u sebelisa fereko e nyenyane ea lenaneo FlameGraph ho tsoa ho Brendan Gregg, e bolokang tatellano ea mohlala oa stack. Re ka boloka seketsoana sa mola o le mong ho 1 ms, ebe re totobatsa le ho boloka sampole ea 100 milliseconds pele trace e fihla. ksoftirqd:

# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100

Liphetho ke tsena:

(сотни следов, которые выглядят похожими)

cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run

Ho na le lintho tse ngata mona, empa ntho e ka sehloohong ke hore re fumana mokhoa oa "cadvisor pele ho ksoftirqd" oo re o boneng pejana ho ICMP tracer. E bolelang?

Mohala o mong le o mong ke mohlala oa CPU ka nako e itseng. E 'ngoe le e' ngoe ea mohala e theolang stack moleng e arotsoe ke semicolon. Bohareng ba mela re bona syscall e bitsoa: read(): .... ;do_syscall_64;sys_read; .... Kahoo cadvisor e qeta nako e ngata e le mohala oa sistimi read()tse amanang le mesebetsi mem_cgroup_* (ka holim'a mehala ea mohala / pheletso ea mohala).

Ha ho bonolo ho bona mohala oa mohala hore na ho baloa eng, ka hona, ha re baleheng strace 'me ha re boneng hore na cadvisor e etsa eng mme re fumane mehala ea sistimi e telele ho feta 100 ms:

theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>

Joalokaha u ka lebella, re bona mehala e liehang mona read(). Ho tsoa ho likahare tsa ts'ebetso ea ho bala le maemo mem_cgroup ho hlakile hore liqholotso tsena read() sheba faele memory.stat, e bonts'ang ts'ebeliso ea memori le meeli ea lihlopha (theknoloji ea Docker's resource isolation). Sesebelisoa sa cadvisor se botsa faele ena ho fumana tlhaiso-leseling ea tšebeliso ea lisebelisoa bakeng sa lijana. Ha re hlahlobeng hore na ke kernel kapa cadvisor e etsang ntho e sa lebelloang:

theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null

real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $

Joale re ka hlahisa kokoanyana mme ra utloisisa hore kernel ea Linux e tobane le lefu la mafu.

Hobaneng ts'ebetso ea ho bala e lieha hakaale?

Nakong ena, ho bonolo haholo ho fumana melaetsa e tsoang ho basebelisi ba bang ka mathata a tšoanang. Ha e le hantle, ho tracker ea cadvisor bug ena e tlalehiloe e le bothata ba tšebeliso e feteletseng ea CPU, ke feela hore ha ho motho ea hlokometseng hore latency e boetse e bonahala ka mokhoa o sa reroang ho stack ea marang-rang. Ka sebele ho ile ha hlokomeloa hore cadvisor e ne e ja nako e ngata ea CPU ho feta kamoo ho neng ho lebeletsoe kateng, empa sena ha sea ka sa fuoa bohlokoa haholo, kaha li-server tsa rona li na le lisebelisoa tse ngata tsa CPU, kahoo bothata ha boa ka ba ithutoa ka hloko.

Bothata ke hore lihlopha li ela hloko tšebeliso ea memori ka har'a sebaka sa mabitso (setshelo). Ha lits'ebetso tsohle tsa sehlopha sena li tsoa, ​​​​Docker e lokolla sehlopha sa memori. Leha ho le joalo, "memory" ha se feela ts'ebetso ea mohopolo. Leha memori ea ts'ebetso ka boeona e se e se e sa sebelisoe, ho bonahala eka kernel e ntse e fana ka litaba tse bolokiloeng, joalo ka meno le li-inode (metadata ea bukana ea faele), tse bolokiloeng ka har'a sehlopha sa memori. Ho tsoa ho tlhaloso ea bothata:

zombie cgroups: lihlopha tse se nang lits'ebetso mme li hlakotsoe, empa li ntse li e-na le mohopolo o fanoeng (boemong ba ka, ho tsoa ho cache ea meno, empa hape e ka abeloa ho tsoa ho cache ea leqephe kapa tmpfs).

Tlhahlobo ea kernel ea maqephe ohle a cache ha u lokolla sehlopha se ka lieha haholo, kahoo ho khethoa mokhoa oa botsoa: emela ho fihlela maqephe ana a kōptjoa hape, ebe qetellong o hlakola sehlopha ha mohopolo o hlile o hlokahala. Ho fihlela nakong ena, cgroup e ntse e nkoa ha ho bokelloa lipalo-palo.

Ho latela pono ea ts'ebetso, ba ile ba tela mohopolo bakeng sa ts'ebetso: ho potlakisa tlhoekiso ea pele ka ho siea mohopolo o bolokiloeng ka morao. Sena se lokile. Ha kernel e sebelisa mohopolo oa ho qetela oa cached, sehlopha se qetella se hlakotsoe, kahoo se ke ke sa bitsoa "leak". Ka bomalimabe, ts'ebetsong e khethehileng ea mokhoa oa ho batla memory.stat phetolelong ena ea kernel (4.9), e kopantsoeng le palo e kholo ea memori ho li-server tsa rona, ho bolela hore ho nka nako e telele ho khutlisa data ea morao-rao e bolokiloeng le ho hlakola Zombies ea sehlopha.

Ho ile ha fumaneha hore tse ling tsa li-node tsa rona li ne li e-na le li-zombies tse ngata tsa sehlopha hoo ho balloang le latency ho ileng ha feta motsotsoana.

Tsela ea ho rarolla bothata ba cadvisor ke ho lokolla hang-hang li-cache tsa meno / li-inodes ho pholletsa le tsamaiso, e leng hang-hang e felisang latency ea ho bala hammoho le latency ea marang-rang ho moeti, kaha ho tlosa cache ho bulela maqephe a lihlopha tsa zombie le ho a lokolla. Sena ha se tharollo, empa se tiisa sesosa sa bothata.

Ho ile ha fumaneha hore liphetolelong tse ncha tsa kernel (4.19+) ts'ebetso ea mohala e ntlafalitsoe memory.stat, kahoo ho fetohela ho kernel ena ho ile ha lokisa bothata. Ka nako e ts'oanang, re ne re e-na le lisebelisoa tsa ho bona li-node tse nang le mathata ka har'a lihlopha tsa Kubernetes, re li tšolle ka bokhabane ebe re li qala bocha. Re ile ra kopanya lihlopha tsohle, ra fumana li-node tse nang le latency e lekaneng 'me ra li qala hape. Sena se re file nako ea ho nchafatsa OS ho li-server tse setseng.

Ho akaretsa

Hobane kokoanyana ena e emisitse ts'ebetso ea queue ea RX NIC bakeng sa li-milliseconds tse makholo, ka nako e ts'oanang e bakile latency e phahameng ho likhokahanyo tse khutšoane le latency ea khokahano ea mahareng, joalo ka lipakeng tsa likopo tsa MySQL le lipakete tsa karabelo.

Ho utloisisa le ho boloka ts'ebetso ea lits'ebetso tsa mantlha, joalo ka Kubernetes, ke tsa bohlokoa ho ts'epahalo le lebelo la lits'ebeletso tsohle tse thehiloeng ho tsona. Sistimi e 'ngoe le e' ngoe eo u e tsamaisang e rua molemo ho ntlafatso ea ts'ebetso ea Kubernetes.

Source: www.habr.com

Eketsa ka tlhaloso