Kev debugging network latency hauv Kubernetes

Kev debugging network latency hauv Kubernetes

Ob peb xyoos dhau los Kubernetes twb tham lawm ntawm qhov official GitHub blog. Txij thaum ntawd los, nws tau dhau los ua tus txheej txheem thev naus laus zis rau kev xa cov kev pabcuam. Kubernetes tam sim no tswj hwm ib feem tseem ceeb ntawm cov kev pabcuam sab hauv thiab pej xeem. Raws li peb pawg tau loj hlob thiab cov kev xav tau ua haujlwm tau nruj dua, peb tau pib pom tias qee qhov kev pabcuam ntawm Kubernetes tau ntsib tsis ntev los no uas tsis tuaj yeem piav qhia los ntawm kev thauj khoom ntawm daim ntawv thov nws tus kheej.

Qhov tseem ceeb, cov ntawv thov kev paub zoo li qhov tsis sib xws hauv network latency txog li 100ms lossis ntau dua, ua rau ncua sijhawm lossis rov sim dua. Cov kev pabcuam tau xav tias yuav tuaj yeem teb cov lus thov sai dua 100ms. Tab sis qhov no tsis yooj yim sua yog tias kev sib txuas nws tus kheej siv sijhawm ntau heev. Cais, peb tau soj ntsuam cov lus nug MySQL ceev heev uas yuav tsum tau siv milliseconds, thiab MySQL tau ua tiav hauv milliseconds, tab sis los ntawm kev thov kev xav, cov lus teb tau siv 100ms lossis ntau dua.

Nws tam sim ntawd paub meej tias qhov teeb meem tsuas yog tshwm sim thaum txuas mus rau Kubernetes node, txawm tias hu tuaj sab nraud Kubernetes. Txoj kev yooj yim tshaj plaws los tsim qhov teeb meem yog nyob rau hauv kev sim Neeg noj zaub, uas khiav los ntawm ib tus tswv tsev sab hauv, sim cov kev pabcuam Kubernetes ntawm ib qho chaw nres nkoj tshwj xeeb, thiab tsis tshua muaj npe latency siab. Hauv tsab xov xwm no, peb yuav saib seb peb tuaj yeem taug qab qhov laj thawj ntawm qhov teeb meem no.

Tshem tawm qhov tsis tsim nyog nyuaj hauv cov saw hlau ua rau tsis ua haujlwm

Los ntawm kev tsim cov qauv qub, peb xav kom nqaim lub ntsiab lus ntawm qhov teeb meem thiab tshem tawm cov txheej txheem tsis tsim nyog. Thaum xub thawj, muaj ntau lub ntsiab lus hauv kev ntws ntawm Vegeta thiab Kubernetes pods. Txhawm rau txheeb xyuas qhov teeb meem sib sib zog nqus, koj yuav tsum txiav txim siab qee yam ntawm lawv.

Kev debugging network latency hauv Kubernetes

Tus neeg siv khoom (Vegeta) tsim TCP kev sib txuas nrog txhua qhov ntawm pawg. Kubernetes ua hauj lwm raws li ib tug overlay network (nyob rau sab saum toj ntawm cov ntaub ntawv chaw network uas twb muaj lawm) uas siv IPIP, uas yog, nws encapsulates IP pob ntawv ntawm lub overlay network nyob rau hauv lub IP pob ntawv ntawm cov ntaub ntawv chaw. Thaum txuas mus rau thawj node, network chaw nyob txhais tau ua tiav Network Chaw txhais lus (NAT) tau hais meej los txhais IP chaw nyob thiab chaw nres nkoj ntawm Kubernetes node rau IP chaw nyob thiab chaw nres nkoj hauv lub network overlay (tshwj xeeb, pod nrog rau daim ntawv thov). Rau cov khoom xa tuaj, qhov rov qab ua ntu zus ntawm kev ua yog ua. Nws yog ib qho kev ua haujlwm nyuaj nrog ntau lub xeev thiab ntau lub ntsiab lus uas tau hloov kho tas li thiab hloov pauv raws li cov kev pabcuam raug xa mus thiab tsiv mus.

Π’ΠΈΠ»ΠΈΡ‚Π° tcpdump Hauv kev sim Vegeta muaj qhov ncua sij hawm thaum TCP tuav tes (nruab nrab ntawm SYN thiab SYN-ACK). Txhawm rau tshem tawm qhov tsis tsim nyog nyuaj no, koj tuaj yeem siv hping3 rau yooj yim "pings" nrog SYN pob ntawv. Peb xyuas seb puas muaj qhov ncua sij hawm ntawm lub pob ntawv teb, thiab tom qab ntawd rov pib dua qhov kev sib txuas. Peb tuaj yeem lim cov ntaub ntawv kom tsuas yog suav nrog cov pob ntawv loj dua 100ms thiab tau txais txoj hauv kev yooj yim dua los tsim cov teeb meem dua li tag nrho cov txheej txheem network 7 hauv Vegeta. Nov yog Kubernetes node "pings" siv TCP SYN / SYN-ACK ntawm qhov kev pabcuam "node chaw nres nkoj" (30927) ntawm 10ms ib ntus, lim los ntawm cov lus teb qeeb tshaj plaws:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -S -p 30927 -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1485 win=29200 rtt=127.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1486 win=29200 rtt=117.0 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1487 win=29200 rtt=106.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=1488 win=29200 rtt=104.1 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5024 win=29200 rtt=109.2 ms

len=46 ip=172.16.47.27 ttl=59 DF id=0 sport=30927 flags=SA seq=5231 win=29200 rtt=109.2 ms

Muaj peev xwm tam sim ua thawj qhov kev soj ntsuam. Kev txiav txim los ntawm cov lej thiab sijhawm, nws yog qhov tseeb tias cov no tsis yog ib lub sijhawm congestion. Qhov ncua sij hawm feem ntau accumulates thiab nws thiaj li ua tiav.

Tom ntej no, peb xav paub seb cov khoom twg tuaj yeem cuam tshuam rau qhov tshwm sim ntawm congestion. Tej zaum cov no yog qee qhov ntau pua ntawm iptables txoj cai hauv NAT? Los yog muaj teeb meem nrog IPIP tunneling ntawm lub network? Ib txoj hauv kev los sim qhov no yog sim txhua kauj ruam ntawm qhov system los ntawm kev tshem tawm nws. Yuav ua li cas yog tias koj tshem tawm NAT thiab firewall logic, tawm hauv IPIP nkaus xwb:

Kev debugging network latency hauv Kubernetes

Hmoov zoo, Linux ua kom yooj yim nkag mus rau IP txheej txheej ncaj qha yog tias lub tshuab nyob rau tib lub network:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7346 win=0 rtt=127.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7347 win=0 rtt=117.3 ms

len=40 ip=10.125.20.64 ttl=64 DF id=0 sport=0 flags=RA seq=7348 win=0 rtt=107.2 ms

Kev txiav txim los ntawm cov txiaj ntsig, qhov teeb meem tseem nyob! Qhov no tsis suav nrog iptables thiab NAT. Yog li qhov teeb meem yog TCP? Cia peb saib seb ICMP ping mus li cas:

theojulienne@kube-node-client ~ $ sudo hping3 10.125.20.64 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=28 ip=10.125.20.64 ttl=64 id=42594 icmp_seq=104 rtt=110.0 ms

len=28 ip=10.125.20.64 ttl=64 id=49448 icmp_seq=4022 rtt=141.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49449 icmp_seq=4023 rtt=131.3 ms

len=28 ip=10.125.20.64 ttl=64 id=49450 icmp_seq=4024 rtt=121.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49451 icmp_seq=4025 rtt=111.2 ms

len=28 ip=10.125.20.64 ttl=64 id=49452 icmp_seq=4026 rtt=101.1 ms

len=28 ip=10.125.20.64 ttl=64 id=50023 icmp_seq=4343 rtt=126.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50024 icmp_seq=4344 rtt=116.8 ms

len=28 ip=10.125.20.64 ttl=64 id=50025 icmp_seq=4345 rtt=106.8 ms

len=28 ip=10.125.20.64 ttl=64 id=59727 icmp_seq=9836 rtt=106.1 ms

Cov txiaj ntsig tau pom tias qhov teeb meem tsis tau ploj mus. Tej zaum qhov no yog IPIP qhov? Cia peb simplify qhov kev xeem ntxiv:

Kev debugging network latency hauv Kubernetes

Puas yog tag nrho cov pob ntawv xa tuaj ntawm ob lub tswv yim no?

theojulienne@kube-node-client ~ $ sudo hping3 172.16.47.27 --icmp -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 id=41127 icmp_seq=12564 rtt=140.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41128 icmp_seq=12565 rtt=130.9 ms

len=46 ip=172.16.47.27 ttl=61 id=41129 icmp_seq=12566 rtt=120.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41130 icmp_seq=12567 rtt=110.8 ms

len=46 ip=172.16.47.27 ttl=61 id=41131 icmp_seq=12568 rtt=100.7 ms

len=46 ip=172.16.47.27 ttl=61 id=9062 icmp_seq=31443 rtt=134.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9063 icmp_seq=31444 rtt=124.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9064 icmp_seq=31445 rtt=114.2 ms

len=46 ip=172.16.47.27 ttl=61 id=9065 icmp_seq=31446 rtt=104.2 ms

Peb tau ua kom yooj yim qhov xwm txheej rau ob lub Kubernetes nodes xa ib pob ntawv, txawm tias ICMP ping. Lawv tseem pom latency yog tias lub hom phiaj tus tswv yog "phem" (qee qhov phem dua li lwm tus).

Tam sim no lo lus nug kawg: vim li cas qhov kev ncua tsuas yog tshwm sim ntawm kube-node servers? Thiab nws puas tshwm sim thaum kube-node yog tus xa lossis tus txais? Luckily, qhov no kuj yog qhov yooj yim heev los txiav txim siab los ntawm kev xa ib pob ntawv los ntawm tus tswv tsev sab nraum Kubernetes, tab sis nrog tib tus neeg txais "paub phem". Raws li koj tuaj yeem pom, qhov teeb meem tsis tau ploj mus:

theojulienne@shell ~ $ sudo hping3 172.16.47.27 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=312 win=0 rtt=108.5 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=5903 win=0 rtt=119.4 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=6227 win=0 rtt=139.9 ms

len=46 ip=172.16.47.27 ttl=61 DF id=0 sport=9876 flags=RA seq=7929 win=0 rtt=131.2 ms

Peb mam li khiav tib qhov kev thov los ntawm qhov dhau los kube-node mus rau tus tswv tsev sab nraud (uas tsis suav nrog tus tswv tsev txij li ping suav nrog RX thiab TX tivthaiv):

theojulienne@kube-node-client ~ $ sudo hping3 172.16.33.44 -p 9876 -S -i u10000 | egrep --line-buffered 'rtt=[0-9]{3}.'
^C
--- 172.16.33.44 hping statistic ---
22352 packets transmitted, 22350 packets received, 1% packet loss
round-trip min/avg/max = 0.2/7.6/1010.6 ms

Los ntawm kev tshuaj xyuas cov pob ntawv latency captures, peb tau txais qee cov ntaub ntawv ntxiv. Tshwj xeeb, tus neeg xa khoom (hauv qab) pom lub sijhawm no, tab sis tus neeg txais (sab saum toj) tsis - pom Delta kem (hauv vib nas this):

Kev debugging network latency hauv Kubernetes

Tsis tas li ntawd, yog tias koj saib qhov sib txawv ntawm qhov kev txiav txim ntawm TCP thiab ICMP pob ntawv (los ntawm cov lej sib txuas) ntawm tus neeg txais, cov pob ntawv ICMP ib txwm tuaj txog tib lub sijhawm uas lawv tau xa tuaj, tab sis nrog lub sijhawm sib txawv. Tib lub sijhawm, TCP pob ntawv qee zaum cuam tshuam, thiab qee qhov ntawm lawv tau daig. Tshwj xeeb, yog tias koj tshuaj xyuas cov chaw nres nkoj ntawm SYN pob ntawv, lawv nyob rau hauv kev txiav txim ntawm tus neeg xa khoom sab, tab sis tsis nyob ntawm tus txais sab.

Muaj qhov sib txawv me ntsis ntawm yuav ua li cas network cards cov servers niaj hnub (xws li cov hauv peb cov ntaub ntawv chaw) txheej txheem pob ntawv uas muaj TCP lossis ICMP. Thaum ib pob ntawv tuaj txog, lub network adapter "hashes nws ib qho kev sib txuas", uas yog, nws sim ua txhaum cov kev sib txuas rau hauv cov kab thiab xa txhua kab mus rau ib qho kev sib cais ntawm cov tub ntxhais processor. Rau TCP, qhov hash no suav nrog ob qhov chaw thiab chaw nyob IP chaw nyob thiab chaw nres nkoj. Hauv lwm lo lus, txhua qhov kev sib txuas yog hashed (muaj peev xwm) txawv. Rau ICMP, tsuas yog IP chaw nyob yog hashed, vim tsis muaj chaw nres nkoj.

Lwm qhov kev soj ntsuam tshiab: lub sijhawm no peb pom ICMP qeeb ntawm txhua qhov kev sib txuas lus ntawm ob tus tswv, tab sis TCP tsis ua. Qhov no qhia peb tias qhov ua rau muaj feem cuam tshuam nrog RX queue hashing: qhov congestion yuav luag muaj tseeb hauv kev ua cov pob ntawv RX, tsis yog xa cov lus teb.

Qhov no tshem tawm kev xa cov pob ntawv los ntawm cov npe ntawm qhov ua tau. Tam sim no peb paub tias qhov teeb meem ua cov pob ntawv yog nyob rau sab txais ntawm qee qhov kube-node servers.

Nkag siab txog kev ua cov pob ntawv hauv Linux kernel

Txhawm rau nkag siab tias yog vim li cas qhov teeb meem tshwm sim ntawm tus txais ntawm qee qhov kube-node servers, cia peb saib yuav ua li cas Linux kernel txheej txheem pob ntawv.

Rov qab mus rau qhov kev siv yooj yim tshaj plaws, daim npav network tau txais cov pob ntawv thiab xa cuam ​​tshuam lub Linux kernel uas muaj ib pob uas yuav tsum tau ua. Lub kernel nres lwm txoj haujlwm, hloov cov ntsiab lus mus rau tus neeg ua haujlwm cuam tshuam, ua cov pob ntawv, thiab rov qab mus rau cov haujlwm tam sim no.

Kev debugging network latency hauv Kubernetes

Cov ntsiab lus hloov pauv no qeeb: latency tej zaum yuav tsis tau pom dua ntawm 10Mbps network phaib hauv 90s, tab sis ntawm 10G phaib niaj hnub nrog qhov siab tshaj plaws ntawm 15 lab pob ntawv ib ob, txhua tus tub ntxhais ntawm yim-core server tuaj yeem cuam tshuam ntau lab. ntawm lub sij hawm ib ob.

Yuav kom tsis txhob cuam tshuam kev cuam tshuam tas li, ntau xyoo dhau los Linux ntxiv NAPI: Network API uas txhua tus tsav tsheb niaj hnub siv los txhim kho kev ua haujlwm ntawm kev kub ceev. Thaum qis ceev cov ntsiav tseem tau txais kev cuam tshuam los ntawm daim npav network hauv txoj kev qub. Thaum cov pob ntawv txaus tuaj txog uas dhau qhov pib, cov ntsiav cuam tshuam cuam tshuam thiab hloov pauv pib xaiv lub network adapter thiab khaws cov pob ntawv hauv chunks. Kev ua haujlwm yog ua hauv softirq, uas yog, hauv cov ntsiab lus ntawm software cuam tshuam tom qab lub kaw lus hu thiab kho vajtse cuam tshuam, thaum lub kernel (as opposed to user space) twb khiav lawm.

Kev debugging network latency hauv Kubernetes

Qhov no nrawm dua, tab sis ua rau muaj teeb meem sib txawv. Yog tias muaj ntau cov pob ntawv, ces txhua lub sijhawm yog siv cov pob ntawv los ntawm daim npav network, thiab cov txheej txheem siv qhov chaw tsis muaj sijhawm los ua kom cov kab no khoob (nyeem los ntawm TCP kev sib txuas, thiab lwm yam). Thaum kawg cov kab ntawv sau thiab peb pib xa cov pob ntawv. Hauv kev sim nrhiav qhov sib npaug, cov ntsiav teeb tsa ib qho peev txheej rau qhov ntau tshaj plaws ntawm cov pob ntawv ua tiav hauv cov ntsiab lus softirq. Thaum cov peev nyiaj no dhau lawm, ib txoj xov sib cais tau sawv ksoftirqd (koj yuav pom ib qho ntawm lawv hauv ps per core) uas tswj cov softirqs sab nraud ntawm txoj kev syscall / cuam tshuam. Cov xov no tau teem sijhawm siv tus txheej txheem txheej txheem teem caij, uas sim faib cov peev txheej ncaj ncees.

Kev debugging network latency hauv Kubernetes

Tom qab kawm txog yuav ua li cas cov txheej txheem kernel packets, koj tuaj yeem pom tias muaj qee yam yuav ua rau muaj kab mob. Yog hais tias softirq hu tau txais tsawg zaus, pob ntawv yuav tsum tau tos qee lub sij hawm ua tiav hauv RX kab ntawm daim npav network. Qhov no yuav yog vim qee txoj haujlwm thaiv cov tub ntxhais processor, lossis lwm yam yog tiv thaiv cov tub ntxhais los ntawm kev khiav softirq.

Narrowing cov txheej txheem mus rau lub hauv paus los yog txoj kev

Softirq ncua sij hawm tsuas yog kwv yees rau tam sim no. Tab sis nws ua rau kev nkag siab, thiab peb paub tias peb tab tom pom qee yam zoo sib xws. Yog li cov kauj ruam tom ntej yog kom paub meej qhov kev xav no. Thiab yog tias nws tau lees paub, ces nrhiav qhov laj thawj rau qhov qeeb.

Cia peb rov qab mus rau peb cov pob ntawv qeeb:

len=46 ip=172.16.53.32 ttl=61 id=29573 icmp_seq=1953 rtt=99.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29574 icmp_seq=1954 rtt=89.3 ms

len=46 ip=172.16.53.32 ttl=61 id=29575 icmp_seq=1955 rtt=79.2 ms

len=46 ip=172.16.53.32 ttl=61 id=29576 icmp_seq=1956 rtt=69.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29577 icmp_seq=1957 rtt=59.1 ms

len=46 ip=172.16.53.32 ttl=61 id=29790 icmp_seq=2070 rtt=75.7 ms

len=46 ip=172.16.53.32 ttl=61 id=29791 icmp_seq=2071 rtt=65.6 ms

len=46 ip=172.16.53.32 ttl=61 id=29792 icmp_seq=2072 rtt=55.5 ms

Raws li tau tham dhau los, cov pob ntawv ICMP no tau muab tso rau hauv ib qho RX NIC kab thiab ua tiav los ntawm ib qho CPU core. Yog tias peb xav nkag siab tias Linux ua haujlwm li cas, nws muaj txiaj ntsig kom paub qhov twg (ntawm qhov twg CPU core) thiab yuav ua li cas (softirq, ksoftirqd) cov pob no tau ua tiav txhawm rau taug qab cov txheej txheem.

Tam sim no nws yog lub sijhawm los siv cov cuab yeej uas tso cai rau koj los saib xyuas Linux ntsiav hauv lub sijhawm. Ntawm no peb siv BCC. Cov txheej txheem no tso cai rau koj los sau cov kev pab cuam C me uas sib txuas ua haujlwm tsis txaus ntseeg hauv cov ntsiav thiab tsis cuam tshuam cov xwm txheej rau hauv qhov chaw siv Python program uas tuaj yeem ua lawv thiab xa cov txiaj ntsig rau koj. Hooking arbitrary functions nyob rau hauv lub kernel yog ib tug dag ua lag ua luam, tab sis cov nqi hluav taws xob yog tsim los rau siab tshaj plaws kev ruaj ntseg thiab yog tsim los taug qab raws nraim hom ntawm cov teeb meem ntau lawm uas tsis yooj yim reproduced nyob rau hauv ib qho kev sim los yog kev loj hlob ib puag ncig.

Txoj kev npaj ntawm no yog qhov yooj yim: peb paub tias cov kernel ua cov ICMP pings, yog li peb yuav muab tus nuv rau ntawm lub ntsiav ua haujlwm. icmp_echo, uas lees txais ICMP echo thov pob ntawv thiab pib xa ICMP ncha teb. Peb tuaj yeem txheeb xyuas cov pob ntawv los ntawm kev nce tus lej icmp_seq, uas qhia tau hais tias hping3 siab dua.

code bcc lus zoo li nyuaj, tab sis nws tsis txaus ntshai li nws zoo li. Muaj nuj nqi icmp_echo nqa struct sk_buff *skb: Qhov no yog ib pob ntawv nrog "echo thov". Peb tuaj yeem taug qab nws, rub tawm cov kab ke echo.sequence (uas piv nrog icmp_seq ua hping3 Π²Ρ‹ΡˆΠ΅), thiab xa mus rau tus neeg siv qhov chaw. Nws kuj tseem yooj yim los ntes cov txheej txheem tam sim no lub npe / id. Hauv qab no yog cov txiaj ntsig uas peb pom ncaj qha thaum lub kernel txheej txheem pob ntawv:

TGID PID PROCESS NAME ICMP_SEQ
0 0 swb/11
770 0 swb/0
11 771 swb/0
0 11 swb/772
0 0 swb/11
773 0 prometheus 0
11 774 swb/20041
20086 775 swb/0
0 11 swb/776
0 0 hais-report-s 11

Nws yuav tsum tau muab sau tseg ntawm no hais tias nyob rau hauv lub ntsiab lus teb softirq cov txheej txheem uas ua rau kev hu xov tooj yuav tshwm sim li "cov txheej txheem" ​​thaum qhov tseeb nws yog cov ntsiav uas muaj kev nyab xeeb ua cov pob ntawv hauv cov ntsiab lus ntawm cov ntsiav.

Nrog rau cov cuab yeej no peb tuaj yeem koom nrog cov txheej txheem tshwj xeeb nrog cov pob tshwj xeeb uas qhia txog kev ncua sijhawm hping3. Cia ua kom yooj yim grep ntawm no capture rau tej yam muaj nqis icmp_seq. Cov pob ntawv sib piv cov nqi icmp_seq saum toj no tau suav nrog lawv cov RTT peb tau pom saum toj no (hauv kab lus yog qhov xav tau RTT qhov tseem ceeb rau pob ntawv peb lim tawm vim RTT qhov tseem ceeb tsawg dua 50ms):

TGID PID PROCESS NAME ICMP_SEQ ** RTT
--
10137 10436 Cadvisor 1951 ib
10137 10436 Cadvisor 1952 ib
76 76 ksoftirqd/11 1953 ** 99ms
76 76 ksoftirqd/11 1954 ** 89ms
76 76 ksoftirqd/11 1955 ** 79ms
76 76 ksoftirqd/11 1956 ** 69ms
76 76 ksoftirqd/11 1957 **59ms
76 76 ksoftirqd/11 1958 ** (49ms)
76 76 ksoftirqd/11 1959 ** (39ms)
76 76 ksoftirqd/11 1960 ** (29ms)
76 76 ksoftirqd/11 1961 ** (19ms)
76 76 ksoftirqd/11 1962 ** (9ms)
--
10137 10436 ib 2068
10137 10436 ib 2069
76 76 ksoftirqd/11 2070 ** 75ms
76 76 ksoftirqd/11 2071 ** 65ms
76 76 ksoftirqd/11 2072 ** 55ms
76 76 ksoftirqd/11 2073 ** (45ms)
76 76 ksoftirqd/11 2074 ** (35ms)
76 76 ksoftirqd/11 2075 ** (25ms)
76 76 ksoftirqd/11 2076 ** (15ms)
76 76 ksoftirqd/11 2077 ** (5ms)

Cov txiaj ntsig tau qhia peb ntau yam. Ua ntej, tag nrho cov pob no tau ua tiav los ntawm cov ntsiab lus ksoftirqd/11. Qhov no txhais tau hais tias rau qhov tshwj xeeb ntawm cov tshuab no, ICMP pob ntawv tau raug xa mus rau core 11 ntawm qhov kawg tau txais. Peb kuj pom tias thaum twg muaj jam, muaj cov pob khoom uas tau ua tiav hauv cov ntsiab lus ntawm kev hu xov tooj. cadvisor... Tom qab ntawv ksoftirqd yuav siv sij hawm dhau txoj hauj lwm thiab ua cov kab ntawv sau: raws nraim cov pob ntawv uas tau sau tom qab cadvisor.

Qhov tseeb tias tam sim ntawd ua ntej nws ib txwm ua haujlwm cadvisor, qhia nws txoj kev koom tes hauv qhov teeb meem. Ironically, lub hom phiaj cadvisor - "Tsim kev siv cov peev txheej thiab cov yam ntxwv ntawm kev ua haujlwm ntawm cov ntim khoom" es tsis ua rau qhov teeb meem kev ua haujlwm no.

Ib yam li lwm yam ntawm cov thawv ntim khoom, cov no yog cov cuab yeej siv siab heev thiab tuaj yeem cia siab tias yuav ntsib teeb meem kev ua haujlwm nyob rau qee qhov xwm txheej tsis tau pom dua.

Cadvisor ua dab tsi uas ua rau cov pob ntawv qeeb qeeb?

Peb tam sim no muaj kev nkag siab zoo nkauj ntawm qhov kev sib tsoo tshwm sim, dab tsi ua rau nws, thiab CPU twg. Peb pom tias vim muaj kev thaiv nyuaj, Linux ntsiav tsis muaj sijhawm teem sijhawm ksoftirqd. Thiab peb pom tias pob ntawv tau ua tiav hauv cov ntsiab lus cadvisor. Nws yog qhov xav tau los xav tias cadvisor launches syscall qeeb, tom qab uas tag nrho cov pob ntawv sau rau lub sijhawm ntawd tau ua tiav:

Kev debugging network latency hauv Kubernetes

Qhov no yog ib txoj kev xav, tab sis yuav ua li cas sim nws? Qhov peb tuaj yeem ua tau yog taug qab lub CPU core thoob plaws hauv cov txheej txheem no, nrhiav qhov chaw uas cov pob ntawv mus dhau cov peev nyiaj thiab ksoftirqd hu ua, thiab tom qab ntawd saib me ntsis rov qab los saib seb qhov twg tau khiav ntawm CPU core ua ntej lub ntsiab lus ntawd. . Nws zoo li x-raying CPU txhua ob peb milliseconds. Nws yuav zoo li no:

Kev debugging network latency hauv Kubernetes

Conveniently, tag nrho cov no tuaj yeem ua tiav nrog cov cuab yeej uas twb muaj lawm. Piv txwv li, perf cov ntaub ntawv txheeb xyuas cov tub ntxhais CPU muab rau ntawm qhov ntau zaus thiab tuaj yeem tsim lub sijhawm hu mus rau lub kaw lus khiav, suav nrog ob qho tib si neeg siv chaw thiab Linux kernel. Koj tuaj yeem nqa cov ntaub ntawv no thiab ua tiav nws siv ib rab diav me me ntawm qhov program FlameGraph los ntawm Brendan Gregg, uas khaws cia qhov kev txiav txim ntawm pawg kab. Peb tuaj yeem txuag cov kab ib kab kab txhua txhua 1 ms, thiab tom qab ntawd tseem ceeb thiab txuag ib qho qauv 100 milliseconds ua ntej kab hits ksoftirqd:

# record 999 times a second, or every 1ms with some offset so not to align exactly with timers
sudo perf record -C 11 -g -F 999
# take that recording and make a simpler stack trace.
sudo perf script 2>/dev/null | ./FlameGraph/stackcollapse-perf-ordered.pl | grep ksoftir -B 100

Nov yog cov txiaj ntsig:

(сотни слСдов, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ выглядят ΠΏΠΎΡ…ΠΎΠΆΠΈΠΌΠΈ)

cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_iter cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages cadvisor;[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];[cadvisor];entry_SYSCALL_64_after_swapgs;do_syscall_64;sys_read;vfs_read;seq_read;memcg_stat_show;mem_cgroup_nr_lru_pages;mem_cgroup_node_nr_lru_pages ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;ixgbe_poll;ixgbe_clean_rx_irq;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;bond_handle_frame;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;ipip_tunnel_xmit;ip_tunnel_xmit;iptunnel_xmit;ip_local_out;dst_output;__ip_local_out;nf_hook_slow;nf_iterate;nf_conntrack_in;generic_packet;ipt_do_table;set_match_v4;ip_set_test;hash_net4_kadt;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;hash_net4_test ksoftirqd/11;ret_from_fork;kthread;kthread;smpboot_thread_fn;smpboot_thread_fn;run_ksoftirqd;__do_softirq;net_rx_action;gro_cell_poll;napi_gro_receive;netif_receive_skb_internal;inet_gro_receive;__netif_receive_skb_core;ip_rcv_finish;ip_rcv;ip_forward_finish;ip_forward;ip_finish_output;nf_iterate;ip_output;ip_finish_output2;__dev_queue_xmit;dev_hard_start_xmit;dev_queue_xmit_nit;packet_rcv;tpacket_rcv;sch_direct_xmit;validate_xmit_skb_list;validate_xmit_skb;netif_skb_features;ixgbe_xmit_frame_ring;swiotlb_dma_mapping_error;__dev_queue_xmit;dev_hard_start_xmit;__bpf_prog_run;__bpf_prog_run

Muaj ntau yam ntawm no, tab sis qhov tseem ceeb tshaj plaws yog tias peb pom "cadvisor ua ntej ksoftirqd" qauv uas peb pom ua ntej hauv ICMP tracer. Nws txhais li cas?

Txhua kab yog CPU taug qab ntawm lub sijhawm tshwj xeeb. Txhua tus hu rau pawg ntawm ib kab yog sib cais los ntawm ib tug semicolon. Hauv nruab nrab ntawm cov kab peb pom syscall hu ua: read(): .... ;do_syscall_64;sys_read; .... Yog li cadvisor siv sijhawm ntau ntawm kev hu xov tooj read()muaj feem xyuam rau kev ua haujlwm mem_cgroup_* (sab saum toj ntawm hu pawg / kawg ntawm kab).

Nws tsis yooj yim pom nyob rau hauv kev hu xov tooj taug qab dab tsi raws nraim tau nyeem, yog li cia peb khiav strace thiab cia saib seb cadvisor ua li cas thiab nrhiav qhov system hu ntev tshaj 100ms:

theojulienne@kube-node-bad ~ $ sudo strace -p 10137 -T -ff 2>&1 | egrep '<0.[1-9]'
[pid 10436] <... futex resumed> ) = 0 <0.156784>
[pid 10432] <... futex resumed> ) = 0 <0.258285>
[pid 10137] <... futex resumed> ) = 0 <0.678382>
[pid 10384] <... futex resumed> ) = 0 <0.762328>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 658 <0.179438>
[pid 10384] <... futex resumed> ) = 0 <0.104614>
[pid 10436] <... futex resumed> ) = 0 <0.175936>
[pid 10436] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.228091>
[pid 10427] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 577 <0.207334>
[pid 10411] <... epoll_ctl resumed> ) = 0 <0.118113>
[pid 10382] <... pselect6 resumed> ) = 0 (Timeout) <0.117717>
[pid 10436] <... read resumed> "cache 154234880nrss 507904nrss_h"..., 4096) = 660 <0.159891>
[pid 10417] <... futex resumed> ) = 0 <0.917495>
[pid 10436] <... futex resumed> ) = 0 <0.208172>
[pid 10417] <... futex resumed> ) = 0 <0.190763>
[pid 10417] <... read resumed> "cache 0nrss 0nrss_huge 0nmapped_"..., 4096) = 576 <0.154442>

Raws li koj xav tau, peb pom kev hu qeeb ntawm no read(). Los ntawm cov ntsiab lus ntawm kev nyeem cov haujlwm thiab cov ntsiab lus mem_cgroup Nws yog qhov tseeb tias cov teeb meem no read() xa mus rau cov ntaub ntawv memory.stat, uas qhia txog kev siv nco thiab cgroup txwv (Docker's resource isolation technology). Cov cuab yeej cadvisor nug cov ntaub ntawv no kom tau txais cov ntaub ntawv siv cov khoom siv rau cov ntim khoom. Cia peb xyuas seb nws puas yog cov ntsiav lossis cadvisor ua ib yam dab tsi uas xav tau:

theojulienne@kube-node-bad ~ $ time cat /sys/fs/cgroup/memory/memory.stat >/dev/null

real 0m0.153s
user 0m0.000s
sys 0m0.152s
theojulienne@kube-node-bad ~ $

Tam sim no peb tuaj yeem tsim cov kab laum thiab nkag siab tias Linux kernel tab tom ntsib tus kab mob.

Vim li cas qhov kev nyeem ntawv qeeb qeeb?

Nyob rau theem no, nws yooj yim dua los nrhiav cov lus los ntawm lwm tus neeg siv txog cov teeb meem zoo sib xws. Raws li nws muab tawm, nyob rau hauv lub cadvisor tracker cov kab no tau tshaj tawm raws li teeb meem ntawm kev siv CPU ntau dhau, nws tsuas yog tsis muaj leej twg pom tias latency kuj tseem tshwm sim nyob rau hauv pawg network. Nws tau pom tseeb tias cadvisor tau siv sijhawm CPU ntau dua li qhov xav tau, tab sis qhov no tsis tau muab qhov tseem ceeb, vim tias peb cov servers muaj ntau cov peev txheej CPU, yog li qhov teeb meem tsis tau ua tib zoo kawm.

Qhov teeb meem yog tias cgroups coj mus rau hauv tus account nco siv nyob rau hauv lub namespace (container). Thaum tag nrho cov txheej txheem hauv cgroup tawm, Docker tso tawm lub cim xeeb cgroup. Txawm li cas los xij, "nco" tsis yog kev nco xwb. Txawm hais tias cov txheej txheem nco nws tus kheej tsis tau siv lawm, nws zoo nkaus li tias cov ntsiav tseem muab cov ntsiab lus cached, xws li dentries thiab inodes (cov npe thiab cov ntaub ntawv metadata), uas yog cached hauv nco cgroup. Los ntawm qhov teeb meem piav qhia:

zombie cgroups: cgroups uas tsis muaj cov txheej txheem thiab tau muab tshem tawm, tab sis tseem muaj lub cim xeeb faib (hauv kuv qhov teeb meem, los ntawm kev kho hniav cache, tab sis nws kuj tuaj yeem faib los ntawm nplooj ntawv cache lossis tmpfs).

Lub kernel daim tshev ntawm tag nrho cov nplooj ntawv hauv cache thaum tso cgroup tuaj yeem qeeb heev, yog li cov txheej txheem tub nkeeg raug xaiv: tos kom txog thaum cov nplooj ntawv no tau thov dua, thiab tom qab ntawd thaum kawg tshem cgroup thaum lub cim xeeb xav tau tiag tiag. Txog rau tam sim no, cgroup tseem raug coj mus rau hauv tus account thaum sau cov txheeb cais.

Los ntawm qhov kev ua tau zoo, lawv tau txi lub cim xeeb rau kev ua tau zoo: ua kom nrawm nrawm los ntawm kev tshem tawm qee qhov kev nco qab. Qhov no zoo. Thaum lub kernel siv qhov kawg ntawm lub cim xeeb cached, lub cgroup nws thiaj li raug tshem tawm, yog li nws tsis tuaj yeem hu ua "kua". Hmoov tsis, qhov tshwj xeeb kev siv ntawm kev tshawb nrhiav mechanism memory.stat nyob rau hauv no kernel version (4.9), ua ke nrog cov loj npaum li cas ntawm lub cim xeeb ntawm peb servers, txhais tau hais tias nws yuav siv sij hawm ntev dua los kho cov ntaub ntawv cached tseeb thiab tshem tawm cgroup zombies.

Nws hloov tawm tias qee qhov ntawm peb cov nodes muaj ntau cgroup zombies uas tau nyeem thiab latency dhau ib thib ob.

Qhov kev daws teeb meem rau qhov teeb meem cadvisor yog kom dawb dentries / inodes caches thoob plaws hauv lub cev, uas tam sim ntawd tshem tawm kev nyeem latency nrog rau lub network latency ntawm tus tswv tsev, txij li tshem tawm cov cache tig rau ntawm cov nplooj ntawv cached cgroup zombie thiab lawv tseem raug tso tawm. Qhov no tsis yog kev daws teeb meem, tab sis nws lees paub qhov ua rau ntawm qhov teeb meem.

Nws muab tawm tias hauv cov ntsiav tshuaj tshiab (4.19+) hu ua kev ua haujlwm tau zoo dua memory.stat, yog li hloov mus rau lub ntsiav no kho qhov teeb meem. Nyob rau tib lub sijhawm, peb muaj cov cuab yeej los txheeb xyuas cov teeb meem hauv cov pab pawg Kubernetes, ua kom dej ntws zoo thiab rov pib dua lawv. Peb combed tag nrho cov pawg, pom cov nodes nrog siab txaus latency thiab rebooted lawv. Qhov no tau muab sijhawm rau peb los hloov kho OS ntawm cov servers ntxiv.

Summing txog

Vim tias cov kab no tau nres RX NIC cov kab ua haujlwm rau ntau pua milliseconds, nws ib txhij ua rau ob qho tib si siab latency ntawm kev sib txuas luv luv thiab nruab nrab-kev sib txuas latency, xws li ntawm MySQL thov thiab cov ntawv teb.

Kev nkag siab thiab tswj xyuas qhov ua tau zoo ntawm cov txheej txheem tseem ceeb tshaj plaws, xws li Kubernetes, yog qhov tseem ceeb rau kev ntseeg siab thiab ceev ntawm txhua qhov kev pabcuam raws li lawv. Txhua qhov system koj khiav cov txiaj ntsig los ntawm Kubernetes kev txhim kho kev ua haujlwm.

Tau qhov twg los: www.hab.com

Ntxiv ib saib