CPU txwv thiab nruj throttling hauv Kubernetes

Nco tseg. txhais.: Lub qhov muag qhib keeb kwm ntawm Omio-tus neeg ncig tebchaws European-coj cov neeg nyeem los ntawm cov kev xav tau yooj yim mus rau qhov ua tau zoo tshaj plaws ntawm Kubernetes teeb tsa. Kev paub txog cov xwm txheej zoo li no pab tsis tau tsuas yog nthuav koj lub qab ntug xwb, tab sis kuj tiv thaiv cov teeb meem tsis tseem ceeb.

CPU txwv thiab nruj throttling hauv Kubernetes

Koj puas tau ntsib ib daim ntawv thov uas tau daig hauv qhov chaw, tsis teb rau kev kuaj mob, thiab koj tsis paub tias yog vim li cas? Ib qho kev piav qhia muaj feem xyuam nrog CPU cov peev txheej quota txwv. Nov yog qhov peb yuav tham txog hauv kab lus no.

TL; DR:
Peb pom zoo kom tsis ua haujlwm CPU txwv hauv Kubernetes (lossis tsis siv CFS quotas hauv Kubelet) yog tias koj siv lub version ntawm Linux ntsiav nrog CFS quota kab laum. Hauv qhov tseem ceeb muaj yog loj thiab paub zoo ib tug kab mob uas ua rau ntau throttling thiab qeeb
.

Hauv Omio tag nrho cov infrastructure yog tswj los ntawm Kubernetes. Tag nrho peb cov haujlwm ua haujlwm tsis muaj xeev thiab tsis muaj xeev tsuas yog khiav ntawm Kubernetes (peb siv Google Kubernetes Cav). Hauv rau lub hlis tas los no, peb tau pib soj ntsuam kev qeeb qeeb. Cov ntawv thov khov lossis tsis teb rau kev kuaj mob, poob kev sib txuas rau lub network, thiab lwm yam. Qhov kev coj cwj pwm no ua rau peb xav tsis thoob rau lub sijhawm ntev, thiab thaum kawg peb tau txiav txim siab ua qhov teeb meem tiag tiag.

Cov ntsiab lus ntawm tsab xov xwm:

  • Ob peb lo lus hais txog ntim thiab Kubernetes;
  • Yuav ua li cas CPU thov thiab txwv tau siv;
  • Yuav ua li cas CPU txwv ua haujlwm nyob rau hauv ntau qhov chaw ib puag ncig;
  • Yuav ua li cas taug qab CPU throttling;
  • Kev daws teeb meem thiab nuances.

Ob peb lo lus hais txog ntim thiab Kubernetes

Kubernetes yog qhov tseem ceeb ntawm cov qauv niaj hnub hauv lub ntiaj teb infrastructure. Nws lub luag haujlwm tseem ceeb yog lub thawv ntim khoom.

Ntim

Yav dhau los, peb yuav tsum tsim cov khoom qub xws li Java JARs / WARs, Python Eggs, lossis executables los khiav ntawm servers. Txawm li cas los xij, txhawm rau ua kom lawv ua haujlwm, yuav tsum tau ua haujlwm ntxiv: txhim kho lub sijhawm ua haujlwm ib puag ncig (Java / Python), tso cov ntaub ntawv tsim nyog nyob rau hauv qhov chaw zoo, kom ntseeg tau tias muaj kev sib raug zoo nrog ib qho tshwj xeeb ntawm lub operating system, thiab lwm yam. Hauv lwm lo lus, ceev faj yuav tsum tau them nyiaj rau kev tswj hwm kev teeb tsa (uas feem ntau yog qhov kev sib cav ntawm cov neeg tsim khoom thiab cov thawj coj ua haujlwm).

Cov thawv hloov txhua yam. Tam sim no lub artifact yog lub thawv duab. Nws tuaj yeem sawv cev raws li hom kev txuas ntxiv ua tiav cov ntaub ntawv uas tsis yog qhov program nkaus xwb, tab sis kuj tseem muaj qhov chaw ua haujlwm puv ntoob (Java / Python / ...), nrog rau cov ntaub ntawv tsim nyog / pob, ua ntej thiab npaj rau khiav. Cov thawv tuaj yeem xa mus thiab khiav ntawm cov servers sib txawv yam tsis muaj cov kauj ruam ntxiv.

Tsis tas li ntawd, cov thawv ntim ua haujlwm hauv lawv tus kheej sandbox ib puag ncig. Lawv muaj lawv tus kheej virtual network adapter, lawv tus kheej cov ntaub ntawv nrog kev txwv tsis pub nkag, lawv tus kheej hierarchy ntawm cov txheej txheem, lawv tus kheej txwv ntawm CPU thiab nco, thiab lwm yam. Tag nrho cov no yog siv ua tsaug rau ib tug tshwj xeeb subsystem ntawm lub Linux ntsiav - namespaces.

Kubernetes

Raws li tau hais ua ntej, Kubernetes yog lub thawv ntim khoom. Nws ua haujlwm zoo li no: koj muab nws lub pas dej ua ke, thiab tom qab ntawd hais tias: "Hav, Kubernetes, cia peb tso kaum zaus ntawm kuv lub thawv nrog 2 processors thiab 3 GB ntawm lub cim xeeb txhua, thiab ua kom lawv khiav!" Kubernetes yuav saib xyuas tus so. Nws yuav pom muaj peev xwm pub dawb, tso cov ntim thiab rov pib dua yog tias tsim nyog, yob tawm qhov hloov tshiab thaum hloov pauv, thiab lwm yam. Qhov tseem ceeb, Kubernetes tso cai rau koj kom paub daws teeb meem ntawm cov khoom siv kho vajtse thiab ua rau ntau lub tshuab tsim nyog rau kev xa tawm thiab khiav daim ntawv thov.

CPU txwv thiab nruj throttling hauv Kubernetes
Kubernetes los ntawm qhov pom ntawm tus neeg layman

Dab tsi yog thov thiab txwv hauv Kubernetes

Okay, peb tau npog cov thawv thiab Kubernetes. Peb kuj paub tias ntau lub thawv tuaj yeem nyob ntawm tib lub tshuab.

Ib qho piv txwv tuaj yeem kos nrog ib chav tsev sib tham. Qhov chaw dav dav (tshuab / chav tsev) raug coj mus thiab xauj rau ntau tus neeg xauj tsev (cov thawv). Kubernetes ua tus tswv tsev. Cov lus nug tshwm sim, yuav ua li cas kom cov neeg xauj tsev tsis sib haum xeeb? Yuav ua li cas yog ib tug ntawm lawv, hais, txiav txim siab qiv chav dej rau ib nrab hnub?

Qhov no yog qhov kev thov thiab kev txwv los ua si. CPU thov xav tau tsuas yog rau lub hom phiaj npaj. Qhov no yog ib yam dab tsi zoo li "xav tau daim ntawv teev npe" ntawm lub thawv, thiab nws yog siv los xaiv cov node uas haum tshaj plaws. Tib lub sijhawm CPU Tsis txhob tuaj yeem muab piv rau daim ntawv cog lus xauj tsev - sai li sai tau thaum peb xaiv ib chav tsev rau lub thawv, lub ua tsis tau mus tshaj qhov txwv. Thiab qhov no yog qhov teeb meem tshwm sim ...

Yuav ua li cas thov thiab txwv tau siv hauv Kubernetes

Kubernetes siv lub tshuab throttling (hloov lub voj voog voj voog) ua rau hauv cov ntsiav los siv CPU txwv. Yog tias daim ntawv thov dhau qhov kev txwv, throttling tau qhib (piv txwv li nws tau txais CPU tsawg dua). Kev thov thiab kev txwv rau lub cim xeeb raug teeb tsa sib txawv, yog li lawv yooj yim dua los kuaj xyuas. Txhawm rau ua qhov no, tsuas yog tshawb xyuas cov xwm txheej rov pib dua tshiab ntawm lub pod: seb nws puas yog "OOMKilled". CPU throttling tsis yooj yim li, vim K8s tsuas yog ua kom muaj kev ntsuas los ntawm kev siv, tsis yog los ntawm cgroups.

CPU Thov

CPU txwv thiab nruj throttling hauv Kubernetes
Yuav ua li cas CPU thov raug siv

Rau qhov yooj yim, cia saib cov txheej txheem siv lub tshuab nrog 4-core CPU ua piv txwv.

K8s siv cov txheej txheem tswj pawg (cgroups) los tswj kev faib cov peev txheej (nco thiab processor). Ib tus qauv hierarchical muaj rau nws: tus menyuam tau txais kev txwv ntawm pawg niam txiv. Cov ntaub ntawv xa tawm tau muab khaws cia rau hauv lub kaw lus virtual (/sys/fs/cgroup). Nyob rau hauv cov ntaub ntawv ntawm ib tug processor qhov no yog /sys/fs/cgroup/cpu,cpuacct/*.

K8s siv cov ntaub ntawv cpu.share los faib cov peev txheej processor. Hauv peb cov ntaub ntawv, lub hauv paus cgroup tau txais 4096 feem ntawm CPU cov peev txheej - 100% ntawm cov khoom siv muaj zog (1 core = 1024; qhov no yog tus nqi ruaj khov). Cov pab pawg hauv paus faib cov peev txheej raws li qhov sib koom ntawm cov xeeb leej xeeb ntxwv sau npe hauv cpu.share, thiab lawv, nyob rau hauv lem, ua tib yam nrog lawv cov xeeb ntxwv, thiab lwm yam. Ntawm qhov raug Kubernetes node, lub hauv paus cgroup muaj peb tus menyuam: system.slice, user.slice ΠΈ kubepods. Thawj ob pab pawg yog siv los faib cov peev txheej ntawm cov txheej txheem tseem ceeb thiab cov neeg siv cov kev pab cuam sab nraud ntawm K8s. Kawg ib - kubepods - tsim los ntawm Kubernetes los faib cov peev txheej ntawm cov pods.

Daim duab saum toj no qhia tau hais tias thawj thiab thib ob pab pawg tau txais txhua 1024 shares, nrog rau kuberpod subgroup faib 4096 sib koom Qhov no ua tau li cas: tom qab tag nrho, pawg hauv paus muaj kev nkag mus rau xwb 4096 shares, thiab cov sum ntawm cov shares ntawm nws cov xeeb leej xeeb ntxwv tseem ceeb tshaj tus naj npawb no (6144)? Lub ntsiab lus yog tias tus nqi ua rau muaj kev nkag siab zoo, yog li Linux teem sijhawm (CFS) siv nws los faib faib cov peev txheej CPU. Hauv peb qhov xwm txheej, thawj ob pawg tau txais 680 shares tiag (16,6% ntawm 4096), thiab kubepod tau txais qhov seem 2736 sib koom Thaum lub sijhawm poob qis, thawj ob pawg yuav tsis siv cov peev txheej uas tau muab faib.

Hmoov zoo, tus teem sijhawm muaj lub tshuab kom tsis txhob nkim CPU cov peev txheej tsis siv. Nws hloov lub peev xwm "tsis ua haujlwm" mus rau lub pas dej thoob ntiaj teb, los ntawm qhov uas nws tau muab faib rau cov pab pawg uas xav tau lub zog siv hluav taws xob ntxiv (kev hloov pauv tshwm sim hauv cov khoom siv kom tsis txhob muaj kev poob qis). Ib txoj kev zoo sib xws yog siv rau txhua tus xeeb ntxwv ntawm cov xeeb leej xeeb ntxwv.

Cov txheej txheem no ua kom muaj kev sib faib ncaj ncees ntawm cov khoom siv hluav taws xob thiab ua kom tsis muaj ib tus txheej txheem "nyem" cov peev txheej los ntawm lwm tus.

CPU txwv

Txawm hais tias qhov tseeb tias qhov kev teeb tsa ntawm kev txwv thiab kev thov hauv K8s zoo sib xws, lawv qhov kev siv yog txawv heev: qhov no feem ntau yuam kev thiab qhov tsawg kawg yog ib feem ntawm cov ntaub ntawv.

K8s sib CFS quota mechanism siv cov kev txwv. Lawv qhov chaw tau teev tseg hauv cov ntaub ntawv cfs_period_us ΠΈ cfs_quota_us hauv cgroup directory (cov ntaub ntawv tseem nyob ntawd cpu.share).

Tsis zoo li cpu.share, cov quota yog nyob ntawm lub sijhawm, thiab tsis nyob rau hauv lub processor muaj hwj chim. cfs_period_us qhia txog lub sijhawm ntawm lub sijhawm (lub sijhawm) - nws yog ib txwm 100000 ΞΌs (100 ms). Muaj kev xaiv los hloov tus nqi no hauv K8s, tab sis nws tsuas yog muaj nyob hauv alpha rau tam sim no. Tus teem sijhawm siv lub sijhawm los rov pib siv cov quotas. Ob daim ntawv cfs_quota_us, qhia lub sijhawm muaj (quota) hauv txhua lub sijhawm. Nco ntsoov tias nws kuj tau teev nyob rau hauv microseconds. Cov quota tuaj yeem tshaj qhov ntev ntawm lub sijhawm; Hauv lwm lo lus, nws yuav ntau dua 100 ms.

Cia peb saib ob qhov xwm txheej ntawm 16-core tshuab (hom computer feem ntau peb muaj ntawm Omio):

CPU txwv thiab nruj throttling hauv Kubernetes
Scenario 1: 2 threads thiab 200 ms txwv. Tsis muaj throttling

CPU txwv thiab nruj throttling hauv Kubernetes
Scenario 2: 10 threads thiab 200 ms txwv. Throttling pib tom qab 20 ms, nkag mus rau cov khoom siv processor yog rov pib dua tom qab 80 ms.

Wb hais tias koj teem CPU txwv rau 2 kernels; Kubernetes yuav txhais tus nqi no rau 200 ms. Qhov no txhais tau hais tias lub thawv tuaj yeem siv qhov siab tshaj plaws ntawm 200ms ntawm lub sijhawm CPU yam tsis muaj kev cuam tshuam.

Thiab qhov no yog qhov kev lom zem pib. Raws li tau hais los saum toj no, qhov muaj quota yog 200 ms. Yog hais tias koj ua hauj lwm nyob rau hauv parallel kaum threads ntawm lub tshuab 12-core (saib cov duab piv txwv rau scenario 2), thaum tag nrho lwm cov pods tsis ua haujlwm, cov quota yuav tas li 20 ms (txij li 10 * 20 ms = 200 ms), thiab tag nrho cov xov ntawm lub plhaub taum no yuav dai. Β» (lub tog raj kheej) rau 80ms. Qhov twb hais lawm kab mob teem caij, vim yog qhov ntau dhau throttling tshwm sim thiab lub thawv tsis tuaj yeem ua tiav cov quota uas twb muaj lawm.

Yuav ua li cas ntsuas throttling hauv pods?

Tsuas yog nkag mus rau hauv lub pod thiab ua tiav cat /sys/fs/cgroup/cpu/cpu.stat.

  • nr_periods - tag nrho cov sijhawm teem sijhawm;
  • nr_throttled - tus naj npawb ntawm throttled lub sij hawm nyob rau hauv lub composition nr_periods;
  • throttled_time - Lub sij hawm throttled cumulative hauv nanoseconds.

CPU txwv thiab nruj throttling hauv Kubernetes

Yuav ua li cas tiag?

Raws li qhov tshwm sim, peb tau txais siab throttling hauv txhua daim ntawv thov. Qee zaum nws nyob hauv ib thiab ib nrab sij hawm muaj zog tshaj qhov xav tau!

Qhov no ua rau muaj ntau yam yuam kev - kev npaj kuaj tsis ua haujlwm, lub thawv khov, kev sib txuas hauv lub network, ncua sijhawm hauv kev hu xov tooj. Qhov no thaum kawg ua rau nce latency thiab ntau dua qhov yuam kev.

Kev txiav txim siab thiab qhov tshwm sim

Txhua yam yooj yim ntawm no. Peb tau tso tseg CPU txwv thiab pib hloov kho OS kernel hauv pawg mus rau qhov tseeb version, uas cov kab laum tau kho. Tus naj npawb ntawm qhov yuam kev (HTTP 5xx) hauv peb cov kev pabcuam tam sim ntawd poob qis:

HTTP 5xx yuam kev

CPU txwv thiab nruj throttling hauv Kubernetes
HTTP 5xx yuam kev rau ib qho kev pabcuam tseem ceeb

teb lub sij hawm p95

CPU txwv thiab nruj throttling hauv Kubernetes
Kev pabcuam tseem ceeb thov latency, 95 feem pua

Cov nqi khiav haujlwm

CPU txwv thiab nruj throttling hauv Kubernetes
Tus lej piv txwv teev siv

Dab tsi yog qhov ntes?

Raws li tau hais nyob rau hauv qhov pib ntawm tsab xov xwm:

Ib qho piv txwv tuaj yeem kos nrog ib chav tsev sib tham ... Kubernetes ua tus tswv tsev. Tab sis yuav ua li cas kom cov neeg xauj tsev tsis sib haum xeeb? Yuav ua li cas yog ib tug ntawm lawv, hais, txiav txim siab qiv chav dej rau ib nrab hnub?

Ntawm no yog tus ntes. Ib lub thawv tsis tu ncua tuaj yeem noj tag nrho cov peev txheej CPU muaj nyob hauv lub tshuab. Yog tias koj muaj daim ntawv thov ntse (piv txwv li, JVM, Mus, Node VM raug teeb tsa kom raug), ces qhov no tsis yog teeb meem: koj tuaj yeem ua haujlwm hauv cov xwm txheej zoo li no ntev. Tab sis yog tias cov ntawv thov tsis zoo los yog tsis ua kom zoo tag nrho (FROM java:latest), qhov xwm txheej yuav tawm ntawm kev tswj hwm. Ntawm Omio peb muaj automated puag Dockerfiles nrog rau qhov tsim nyog tsim nyog rau pawg lus loj, yog li qhov teeb meem no tsis muaj nyob.

Peb pom zoo saib xyuas cov metrics SIV (siv, saturation thiab yuam kev), API ncua sij hawm thiab qhov ua yuam kev. Xyuas kom meej tias cov txiaj ntsig tau raws li qhov xav tau.

ua tim khawv

Nov yog peb zaj dab neeg. Cov ntaub ntawv hauv qab no tau pab kom nkag siab tias muaj dab tsi tshwm sim:

Kubernetes kab mob qhia:

Koj puas tau ntsib cov teeb meem zoo sib xws hauv koj qhov kev coj ua lossis muaj kev paub txog kev cuam tshuam hauv qhov chaw ntim khoom? Qhia koj zaj dab neeg hauv cov lus!

PS los ntawm tus txhais lus

Nyeem kuj ntawm peb blog:

Tau qhov twg los: www.hab.com

Ntxiv ib saib