Iyakar CPU da tashin hankali a cikin Kubernetes

Lura. fassara: Wannan tarihin buɗe ido na Omio - mai tara balaguron balaguro na Turai - yana ɗaukar masu karatu daga ka'idar asali zuwa abubuwan ban sha'awa masu amfani na tsarin Kubernetes. Sanin irin waɗannan lokuta yana taimakawa ba kawai faɗaɗa hangen nesa ba, amma har ma yana hana matsalolin da ba su da mahimmanci.

Iyakar CPU da tashin hankali a cikin Kubernetes

Shin kun taɓa samun aikace-aikacen da aka makale a wurin, daina ba da amsa ga duba lafiyar ku, kuma ba ku iya gano dalili? Wani bayani mai yuwuwa yana da alaƙa da iyakokin keɓaɓɓun albarkatun CPU. Wannan shi ne abin da za mu yi magana a kai a wannan labarin.

TL, DR:
Muna ba da shawara mai ƙarfi don kashe iyakokin CPU a cikin Kubernetes (ko kashe ƙimar CFS a Kubelet) idan kuna amfani da sigar Linux kernel tare da bug keɓaɓɓun CFS. A cikin ainihin akwai tsanani kuma sananne kwaro da ke kaiwa ga wuce gona da iri da jinkiri
.

In Omio Kubernetes ne ke sarrafa dukkan abubuwan more rayuwa. Duk kayan aikin mu na jiha da marasa jiha suna gudana ne kawai akan Kubernetes (muna amfani da Injin Kubernetes na Google). A cikin watanni shida da suka gabata, mun fara lura da raguwar bazuwar. Aikace-aikace sun daskare ko dakatar da amsawa ga binciken lafiya, rasa haɗin kai zuwa cibiyar sadarwa, da sauransu. Wannan hali ya daɗe da daure mana kai, kuma a ƙarshe mun yanke shawarar ɗaukar matsalar da mahimmanci.

Takaitacciyar labarin:

  • 'Yan kalmomi game da kwantena da Kubernetes;
  • Yadda ake aiwatar da buƙatun CPU da iyakoki;
  • Yadda iyakar CPU ke aiki a cikin mahalli masu yawa;
  • Yadda ake bin diddigin CPU;
  • Maganin matsalar da nuances.

Kalmomi kaɗan game da kwantena da Kubernetes

Kubernetes shine ainihin ma'aunin zamani a duniyar abubuwan more rayuwa. Babban aikinsa shi ne sarrafa kwantena.

Kwantena

A baya, dole ne mu ƙirƙira kayan tarihi kamar Java JARs/WARs, Python Eggs, ko masu aiwatarwa don aiki akan sabar. Koyaya, don sanya su aiki, dole ne a ƙara ƙarin aiki: shigar da yanayin lokacin aiki (Java/Python), sanya fayilolin da suka dace a wuraren da suka dace, tabbatar da dacewa tare da takamaiman sigar tsarin aiki, da sauransu. A takaice dai, dole ne a mai da hankali sosai ga sarrafa tsarin (wanda galibi shine tushen jayayya tsakanin masu haɓakawa da masu gudanar da tsarin).

Kwantena sun canza komai. Yanzu kayan aikin hoton akwati ne. Ana iya wakilta shi azaman nau'in fayil ɗin da za a iya aiwatarwa wanda ya ƙunshi ba kawai shirin ba, har ma da cikakken yanayin aiwatar da aiwatarwa (Java / Python / ...), da kuma fayilolin da ake buƙata / fakiti, an riga an shigar da su kuma shirye don gudu Ana iya tura kwantena da aiki akan sabar daban-daban ba tare da ƙarin matakai ba.

Bugu da ƙari, kwantena suna aiki a cikin yanayin sandbox ɗin su. Suna da nasu adaftar hanyar sadarwa mai kama-da-wane, tsarin fayil ɗin nasu tare da iyakataccen damar, tsarin tafiyar da nasu, iyakokin nasu akan CPU da ƙwaƙwalwar ajiya, da dai sauransu. Duk wannan ana aiwatar da godiya ga tsarin yanki na musamman na Linux kernel - wuraren suna.

Kubernetes

Kamar yadda aka fada a baya, Kubernetes mawaƙin kwantena ne. Yana aiki kamar haka: kuna ba shi tafkin injuna, sannan ku ce: "Hey, Kubernetes, bari mu ƙaddamar da misalan akwati guda goma tare da na'urori masu sarrafawa 2 da 3 GB na ƙwaƙwalwar ajiya kowanne, kuma ku ci gaba da gudana!" Kubernetes zai kula da sauran. Zai sami damar kyauta, ƙaddamar da kwantena kuma zata sake kunna su idan ya cancanta, fitar da sabuntawa lokacin canza sigogi, da sauransu. Mahimmanci, Kubernetes yana ba ku damar ɓoye abubuwan kayan masarufi kuma yana yin nau'ikan tsarin da suka dace don ƙaddamarwa da gudanar da aikace-aikacen.

Iyakar CPU da tashin hankali a cikin Kubernetes
Kubernetes daga ra'ayi na layman

Menene buƙatu da iyaka a cikin Kubernetes

To, mun rufe kwantena da Kubernetes. Mun kuma san cewa kwantena da yawa na iya zama akan na'ura ɗaya.

Ana iya zana kwatance tare da ɗakin jama'a. Ana ɗaukar fili mai faɗi (injuna/raka'a) kuma an yi hayar ga masu haya da yawa (kwantena). Kubernetes yana aiki azaman mai siyarwa. Tambayar ta taso, ta yaya za a kiyaye masu haya daga rikici da juna? Idan ɗayansu, ya ce, ya yanke shawarar aro gidan wanka na rabin yini fa?

Wannan shine inda buƙatu da iyakoki ke shiga cikin wasa. CPU request ake bukata kawai don dalilai na tsarawa. Wannan wani abu ne kamar "jerin buri" na akwati, kuma ana amfani da shi don zaɓar kumburi mafi dacewa. A lokaci guda CPU Iyaka za a iya kwatanta shi da yarjejeniyar haya - da zaran mun zaɓi naúrar don akwati, da ba zai iya ba wuce kafaffen iyakoki. Kuma anan ne matsalar ta taso...

Yadda ake aiwatar da buƙatu da iyakoki a Kubernetes

Kubernetes yana amfani da tsarin srottling (skipping cycles) wanda aka gina a cikin kwaya don aiwatar da iyakokin CPU. Idan aikace-aikacen ya wuce iyaka, ana kunna srottling (watau yana karɓar ƙananan kewayon CPU). An tsara buƙatun da iyaka don ƙwaƙwalwar ajiya daban, don haka suna da sauƙin ganewa. Don yin wannan, kawai duba matsayi na sake farawa na ƙarshe na kwas ɗin: ko "OOMKilled" ne. Maƙarƙashiyar CPU ba ta da sauƙi, tunda K8s kawai ke samar da awo ta amfani, ba ta ƙungiyoyi ba.

Bukatar CPU

Iyakar CPU da tashin hankali a cikin Kubernetes
Yadda ake aiwatar da buƙatar CPU

Don sauƙi, bari mu kalli tsarin ta amfani da na'ura mai CPU mai 4-core a matsayin misali.

K8s na amfani da tsarin ƙungiyar sarrafawa (kungiyoyi) don sarrafa rabon albarkatu (ƙwaƙwalwar ajiya da mai sarrafawa). Akwai samfurin matsayi don shi: yaron ya gaji iyakokin ƙungiyar iyaye. Ana adana bayanan rarrabawa a cikin tsarin fayil ɗin kama-da-wane (/sys/fs/cgroup). A cikin yanayin processor wannan shine /sys/fs/cgroup/cpu,cpuacct/*.

K8s yana amfani da fayil cpu.share don ware albarkatun sarrafawa. A cikin yanayinmu, rukunin tushen yana samun 4096 hannun jari na albarkatun CPU - 100% na ikon sarrafawa (1 core = 1024; wannan ƙayyadaddun ƙimar). Tushen rukunin yana rarraba albarkatu daidai gwargwado dangane da hannun jarin zuriyar da aka yiwa rajista a ciki cpu.share, su kuma, haka suke yi da zuriyarsu, da sauransu. A kan kumburin Kubernetes na yau da kullun, rukunin tushen yana da yara uku: system.slice, user.slice и kubepods. Ana amfani da ƙungiyoyi biyu na farko don rarraba albarkatu tsakanin ma'auni mai mahimmanci da shirye-shiryen mai amfani a wajen K8s. Na karshe - kubepods - wanda Kubernetes ya ƙirƙira don rarraba albarkatun tsakanin kwasfa.

Hoton da ke sama yana nuna cewa ƙungiyoyin farko da na biyu sun karɓi kowannensu 1024 hannun jari, tare da ƙungiyar kuberpod da aka ware 4096 hannun jari Ta yaya hakan zai yiwu: bayan haka, rukunin tushen yana da damar zuwa kawai 4096 hannun jari, kuma jimillar kason zuriyarta ya zarce wannan adadi sosai (6144)? Ma'anar ita ce ƙimar tana da ma'ana mai ma'ana, don haka mai tsara tsarin Linux (CFS) yana amfani da shi don rarraba albarkatun CPU daidai gwargwado. A cikin yanayinmu, ƙungiyoyi biyu na farko suna karɓa 680 hannun jari na gaske (16,6% na 4096), kuma kubepod yana karɓar ragowar 2736 hannun jari Idan akwai raguwar lokaci, ƙungiyoyi biyu na farko ba za su yi amfani da albarkatun da aka keɓe ba.

Abin farin ciki, mai tsara jadawalin yana da hanyar da za ta guje wa ɓarna albarkatun CPU da ba a yi amfani da su ba. Yana canja wurin ƙarfin "rago" zuwa tafkin duniya, daga abin da aka rarraba shi ga ƙungiyoyin da ke buƙatar ƙarin ƙarfin sarrafawa (canja wurin yana faruwa a cikin batches don kauce wa hasara). Ana amfani da irin wannan hanyar ga duk zuriyar zuriya.

Wannan tsarin yana tabbatar da ingantaccen rarraba ikon sarrafawa kuma yana tabbatar da cewa babu wanda ke aiwatar da “sata” albarkatun daga wasu.

Iyakar CPU

Duk da cewa jeri na iyakoki da buƙatun a cikin K8s sunyi kama da juna, aiwatar da su ya bambanta sosai: wannan. mafi ɓatarwa da mafi ƙarancin rubuce-rubuce.

K8s yana aiki Hanyoyin ciniki na CFS don aiwatar da iyaka. An ƙayyade saitunan su a cikin fayiloli cfs_period_us и cfs_quota_us a cikin kundin rukunin rukuni (fayil ɗin kuma yana can cpu.share).

Ba kamar cpu.share, adadin ya dogara ne akan lokacin lokaci, kuma ba akan ikon sarrafawa da ke akwai ba. cfs_period_us Yana ƙayyade tsawon lokacin (epoch) - koyaushe shine 100000 μs (100 ms). Akwai zaɓi don canza wannan ƙimar a cikin K8s, amma ana samunsa kawai a cikin alpha a yanzu. Mai tsara jadawalin yana amfani da zamanin don sake kunna adadin da aka yi amfani da shi. Fayil na biyu cfs_quota_us, yana ƙayyadadden lokacin da ake da shi (ƙididdiga) a cikin kowane zamani. Lura cewa an kuma ayyana shi a cikin daƙiƙa guda. Ƙidaya na iya wuce tsawon zamanin; a wasu kalmomi, yana iya zama fiye da 100 ms.

Bari mu kalli yanayi guda biyu akan injuna 16-core (nau'in kwamfutar da aka fi sani da mu a Omio):

Iyakar CPU da tashin hankali a cikin Kubernetes
Yanayi 1: 2 zaren da iyaka 200 ms. Babu tsukewa

Iyakar CPU da tashin hankali a cikin Kubernetes
Yanayi 2: 10 zaren da iyaka 200 ms. Matsakaicin yana farawa bayan 20 ms, ana ci gaba da samun damar kayan aikin sarrafawa bayan wani 80 ms

Bari mu ce kun saita iyakar CPU zuwa 2 kwaya; Kubernetes zai fassara wannan ƙimar zuwa 200 ms. Wannan yana nufin cewa kwandon zai iya amfani da matsakaicin 200ms na lokacin CPU ba tare da maƙarƙashiya ba.

Kuma wannan shi ne inda nishaɗi ya fara. Kamar yadda aka ambata a sama, adadin da ake samu shine 200 ms. Idan kuna aiki a layi daya goma zaren a kan na'ura mai mahimmanci 12 (duba hoto don labari na 2), yayin da duk sauran kwas ɗin ba su da aiki, adadin zai ƙare a cikin 20 ms kawai (tun 10 * 20 ms = 200 ms), kuma duk zaren wannan kwafsa zai rataye. » (maƙura) don ms 80 na gaba. Wanda aka riga aka ambata bug mai tsarawa, saboda abin da ya wuce kima yana faruwa kuma kwandon ba zai iya cika adadin da ake da shi ba.

Yadda za a kimanta throttling a cikin kwasfa?

Kawai shiga cikin kwas ɗin kuma aiwatar cat /sys/fs/cgroup/cpu/cpu.stat.

  • nr_periods - jimlar adadin lokutan tsarawa;
  • nr_throttled - adadin lokuta masu tsauri a cikin abun da ke ciki nr_periods;
  • throttled_time - jimlar throttled lokaci a nanoseconds.

Iyakar CPU da tashin hankali a cikin Kubernetes

Me ke faruwa da gaske?

A sakamakon haka, muna samun babban throttling a duk aikace-aikace. Wani lokaci yana shiga sau daya da rabi ya fi karfin lissafi!

Wannan yana haifar da kurakurai iri-iri - gazawar duba shirye-shiryen, daskarewar akwati, karya haɗin yanar gizo, ƙarewar lokaci tsakanin kiran sabis. Wannan a ƙarshe yana haifar da ƙara yawan jinkiri da ƙimar kuskure mafi girma.

Hukunci da sakamako

Komai yana da sauki a nan. Mun yi watsi da iyakokin CPU kuma mun fara sabunta OS kernel a cikin gungu zuwa sabon sigar, wanda aka gyara kwaro a ciki. Adadin kurakurai (HTTP 5xx) a cikin ayyukanmu ya ragu nan da nan sosai:

HTTP 5xx kurakurai

Iyakar CPU da tashin hankali a cikin Kubernetes
Kurakurai HTTP 5xx don sabis mai mahimmanci ɗaya

Lokacin amsa p95

Iyakar CPU da tashin hankali a cikin Kubernetes
Mahimmancin buƙatar buƙatar sabis, kashi 95 cikin ɗari

Kudin aiki

Iyakar CPU da tashin hankali a cikin Kubernetes
Adadin sa'o'in misali da aka kashe

Menene kama?

Kamar yadda aka fada a farkon labarin:

Ana iya zana kwatancen tare da ɗakin jama'a ... Kubernetes yana aiki a matsayin ɗan kasuwa. Amma ta yaya za a kiyaye masu haya daga rikici da juna? Idan ɗayansu, ya ce, ya yanke shawarar aro gidan wanka na rabin yini fa?

Ga kama. Kwantena ɗaya mara hankali zai iya cinye duk albarkatun CPU da ke kan na'ura. Idan kuna da tarin aikace-aikacen wayo (misali, JVM, Go, Node VM an daidaita su yadda yakamata), to wannan ba matsala bane: zaku iya aiki a cikin irin waɗannan yanayi na dogon lokaci. Amma idan aikace-aikacen ba su inganta ba ko kuma ba a inganta su kwata-kwata (FROM java:latest), lamarin na iya fita daga cikin iko. A Omio mun sami Dockerfiles na tushe mai sarrafa kansa tare da isassun saitunan tsoho don manyan tarin harshe, don haka wannan batun bai wanzu ba.

Muna ba da shawarar saka idanu awo AMFANI (amfani, jikewa da kurakurai), jinkirin API da ƙimar kuskure. Tabbatar cewa sakamakon ya cika tsammanin.

nassoshi

Wannan shine labarinmu. Abubuwan da ke gaba sun taimaka sosai wajen fahimtar abin da ke faruwa:

Kubernetes bug rahoton:

Shin kun ci karo da irin waɗannan matsalolin a cikin aikinku ko kuna da gogewar da ke da alaƙa da ƙumburi a cikin wuraren samarwa? Raba labarin ku a cikin sharhi!

PS daga mai fassara

Karanta kuma a kan shafinmu:

source: www.habr.com

Add a comment