A cikin shekarun da ake amfani da Kubernetes a cikin samarwa, mun tara labarai masu ban sha'awa da yawa na yadda kwari a cikin sassan tsarin daban-daban suka haifar da mummunan sakamako da / ko rashin fahimta da ke shafar aikin kwantena da kwasfa. A cikin wannan labarin mun yi zaɓi na wasu na kowa ko ban sha'awa. Ko da ba ka taɓa samun sa'a don fuskantar irin waɗannan yanayi ba, karanta game da irin waɗannan gajerun labarun bincike - musamman "hannun farko" - koyaushe yana da ban sha'awa, ko ba haka ba?
Labari na 1. Supercronic da Docker rataye
A ɗaya daga cikin gungu, mun karɓi Docker daskararre lokaci-lokaci, wanda ke yin katsalandan ga aikin yau da kullun na gungu. A lokaci guda, an lura da waɗannan abubuwan a cikin rajistan ayyukan Docker:
level=error msg="containerd: start init process" error="exit status 2: "runtime/cgo: pthread_create failed: No space left on device
SIGABRT: abort
PC=0x7f31b811a428 m=0
goroutine 0 [idle]:
goroutine 1 [running]:
runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420026768 sp=0xc420026760
runtime.main() /usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200267c0 sp=0xc420026768
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267c8 sp=0xc4200267c0
goroutine 17 [syscall, locked to thread]:
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
…
Abin da ya fi sha'awar mu game da wannan kuskuren shine saƙon: pthread_create failed: No space left on device. Saurin Karatu takardun ya bayyana cewa Docker ba zai iya cokali mai yatsa ba, wanda shine dalilin da ya sa yakan daskare lokaci-lokaci.
A cikin saka idanu, hoto mai zuwa yayi daidai da abin da ke faruwa:
Ya juya cewa wannan hali shine sakamakon kwaf ɗin aiki tare da supercronic (a Go mai amfani wanda muke amfani dashi don gudanar da ayyukan cron a cikin kwasfa):
Matsalar ita ce: lokacin da aka gudanar da aiki a cikin supercronic, tsarin ya haifar da shi ba zai iya ƙarewa daidai ba, juyawa zuwa aljan.
Примечание: Don zama madaidaici, ana haifar da matakai ta hanyar ayyukan cron, amma supercronic ba tsarin init ba ne kuma ba zai iya "karɓa" hanyoyin da 'ya'yansa suka haifa ba. Lokacin da aka tayar da sigina na SIGHUP ko SIGTERM, ba a ba da su ga tsarin yara ba, wanda ke haifar da tsarin yaron ba ya ƙare kuma ya kasance cikin matsayi na aljan. Kuna iya karanta ƙarin game da duk waɗannan, misali, a cikin irin wannan labarin.
Akwai hanyoyi guda biyu don magance matsalolin:
A matsayin aikin wucin gadi - ƙara yawan PIDs a cikin tsarin a lokaci guda:
/proc/sys/kernel/pid_max (since Linux 2.5.34)
This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID). PIDs greater than this value are not allo‐
cated; thus, the value in this file also acts as a system-wide limit on the total number of processes and threads. The default value for this file, 32768, results in the
same range of PIDs as on earlier kernels
Ko kaddamar da ayyuka a cikin supercronic ba kai tsaye ba, amma ta amfani da iri ɗaya tini, wanda zai iya kawo karshen tafiyar matakai daidai kuma ba spawn aljanu.
Labari na 2. "Aljanu" lokacin share rukuni
Kubelet ya fara cin CPU mai yawa:
Ba wanda zai so wannan, don haka muka yi wa kanmu makamai cikakke kuma ya fara magance matsalar. Sakamakon binciken ya kasance kamar haka.
Kubelet yana ciyarwa fiye da kashi uku na lokacin CPU ɗin sa yana jan bayanan ƙwaƙwalwar ajiya daga duk ƙungiyoyi:
A cikin jerin aikawasiku masu haɓaka kernel za ku iya samu tattauna matsalar. A taqaice dai maganar ta gangaro zuwa ga haka: Fayilolin tmpfs daban-daban da sauran abubuwa makamantan ba a cire su gaba ɗaya daga tsarin lokacin share rukuni, abin da ake kira memcg aljan. Ba dade ko ba dade za a share su daga cache na shafin, amma akwai ƙwaƙwalwar ajiya da yawa akan uwar garken kuma kernel bai ga ma'anar ɓata lokaci akan goge su ba. Shi ya sa suke ta tari. Me yasa hakan ma yake faruwa? Wannan uwar garken ne tare da ayyukan cron wanda ke haifar da sababbin ayyuka kullum, kuma tare da su sababbin kwasfa. Don haka, an ƙirƙiri sabbin ƙungiyoyi don kwantena a cikinsu, waɗanda ba da daɗewa ba za a share su.
Me yasa cAdvisor a kubelet ke bata lokaci mai yawa? Wannan yana da sauƙin gani tare da mafi sauƙin kisa time cat /sys/fs/cgroup/memory/memory.stat. Idan akan na'ura mai lafiya aikin yana ɗaukar 0,01 seconds, to akan cron02 mai matsala yana ɗaukar 1,2 seconds. Abun shine cAdvisor, wanda ke karanta bayanai daga sysfs a hankali, yayi ƙoƙarin yin la'akari da ƙwaƙwalwar ajiyar da ake amfani da ita a cikin ƙungiyoyin aljanu.
Don cire aljanu da ƙarfi, mun gwada share caches kamar yadda aka ba da shawarar a cikin LKML: sync; echo 3 > /proc/sys/vm/drop_caches, - amma kernel ɗin ya zama mai rikitarwa kuma ya fado da motar.
Me za a yi? Ana gyara matsalar (aikata, kuma ga bayanin duba sako sako) sabunta Linux kernel zuwa sigar 4.16.
Tarihi 3. Systemd da dutsensa
Bugu da ƙari, kubelet yana cin albarkatu da yawa akan wasu nodes, amma wannan lokacin yana cinye ƙwaƙwalwar ajiya da yawa:
Ya bayyana cewa akwai matsala a cikin tsarin da ake amfani da shi a cikin Ubuntu 16.04, kuma yana faruwa lokacin sarrafa matakan da aka ƙirƙira don haɗi. subPath daga ConfigMap's ko sirrin. Bayan kwafsa ya gama aikinsa Sabis ɗin systemd da tsaunin sabis ɗin sa sun kasance a cikin tsarin. Bayan lokaci, adadi mai yawa daga cikinsu yana taruwa. Akwai ma batutuwa akan wannan batu:
... na ƙarshe wanda ke nufin PR a cikin tsarin: #7811 (matsaloli a cikin tsarin) #7798).
Matsalar ba ta wanzu a cikin Ubuntu 18.04, amma idan kuna son ci gaba da amfani da Ubuntu 16.04, kuna iya samun aikin mu akan wannan batu yana da amfani.
#!/bin/bash
# we will work only on xenial
hostrelease="/etc/lsb-release-host"
test -f ${hostrelease} && grep xenial ${hostrelease} > /dev/null || exit 0
# sleeping max 30 minutes to dispense load on kube-nodes
sleep $((RANDOM % 1800))
stoppedCount=0
# counting actual subpath units in systemd
countBefore=$(systemctl list-units | grep subpath | grep "run-" | wc -l)
# let's go check each unit
for unit in $(systemctl list-units | grep subpath | grep "run-" | awk '{print $1}'); do
# finding description file for unit (to find out docker container, who born this unit)
DropFile=$(systemctl status ${unit} | grep Drop | awk -F': ' '{print $2}')
# reading uuid for docker container from description file
DockerContainerId=$(cat ${DropFile}/50-Description.conf | awk '{print $5}' | cut -d/ -f6)
# checking container status (running or not)
checkFlag=$(docker ps | grep -c ${DockerContainerId})
# if container not running, we will stop unit
if [[ ${checkFlag} -eq 0 ]]; then
echo "Stopping unit ${unit}"
# stoping unit in action
systemctl stop $unit
# just counter for logs
((stoppedCount++))
# logging current progress
echo "Stopped ${stoppedCount} systemd units out of ${countBefore}"
fi
done
... kuma yana gudana kowane minti 5 ta amfani da supercronic da aka ambata a baya. Dockerfile ɗin sa yayi kama da haka:
An lura cewa: idan muna da kwasfa da aka sanya a kan kumburi kuma an fitar da hotonsa na dogon lokaci, to, wani kwasfa wanda ya "buga" kumburi iri ɗaya zai kasance kawai. baya fara jan hoton sabon kwafsa. Maimakon haka, yana jira har sai an ja hoton kwaf ɗin da ya gabata. Sakamakon haka, kwas ɗin da aka riga aka tsara kuma wanda za a iya sauke hotonsa a cikin minti ɗaya kawai zai ƙare a matsayin containerCreating.
Abubuwan da suka faru za su yi kama da haka:
Normal Pulling 8m kubelet, ip-10-241-44-128.ap-northeast-1.compute.internal pulling image "registry.example.com/infra/openvpn/openvpn:master"
Sai dai itace cewa hoto guda daga jinkirin yin rajista na iya toshe turawa kowane kumburi.
Abin takaici, babu hanyoyi da yawa daga cikin halin da ake ciki:
Gwada amfani da Docker Registry kai tsaye a cikin gungu ko kai tsaye tare da gungu (misali, GitLab Registry, Nexus, da sauransu);
Labari na 5. Nodes suna rataye saboda rashin ƙwaƙwalwar ajiya
A lokacin aiki na daban-daban aikace-aikace, mun kuma ci karo da wani halin da ake ciki inda wani kumburi gaba daya daina zama m: SSH ba ya amsa, duk saka idanu daemons fada kashe, sa'an nan kuma babu wani abu (ko kusan kome ba) anomalous a cikin rajistan ayyukan.
Zan gaya muku a cikin hotuna ta amfani da misalin kumburi ɗaya inda MongoDB ke aiki.
Wannan shine yadda atop yayi kama to hadurra:
Kuma kamar haka - после hadurra:
A cikin saka idanu, akwai kuma tsalle mai kaifi, wanda kumburi ya daina kasancewa:
Don haka, daga hotunan kariyar ya bayyana a sarari cewa:
RAM akan injin yana kusa da ƙarshe;
Akwai tsalle mai kaifi a cikin amfani da RAM, bayan haka samun damar shiga gabaɗayan injin ɗin ya lalace ba zato ba tsammani;
Babban ɗawainiya ya zo kan Mongo, wanda ke tilasta tsarin DBMS don amfani da ƙarin ƙwaƙwalwar ajiya da karantawa daga faifai.
Ya bayyana cewa idan Linux ya ƙare daga ƙwaƙwalwar ajiyar kyauta (matsa lamba ta ƙwaƙwalwar ajiya) kuma babu musanyawa, to. to Lokacin da mai kashe OOM ya zo, aikin daidaitawa na iya tasowa tsakanin jefa shafuka a cikin cache ɗin shafi da rubuta su zuwa faifai. Ana yin wannan ta kswapd, wanda cikin ƙarfin hali yana 'yantar da yawancin shafukan ƙwaƙwalwar ajiya gwargwadon yiwuwar rarrabawa na gaba.
Abin takaici, tare da babban nauyin I/O haɗe tare da ƙaramin adadin ƙwaƙwalwar ajiya kyauta, kswapd ya zama ginshiƙin tsarin gaba ɗaya, saboda an daure su da shi duk kasafi (laikan shafi) na shafukan ƙwaƙwalwar ajiya a cikin tsarin. Wannan na iya ci gaba na dogon lokaci idan matakan ba sa son yin amfani da ƙwaƙwalwar ajiya kuma, amma an daidaita su a ƙarshen rami mai kisa na OOM.
Tambayar dabi'a ita ce: me yasa mai kashe OOM ya zo a makara? A cikin maimaitawar da yake yi yanzu, mai kashe OOM yayi wauta sosai: zai kashe tsarin ne kawai lokacin da ƙoƙarin rarraba shafin ƙwaƙwalwar ajiya ya kasa, watau. idan laifin shafin ya gaza. Wannan ba ya faruwa na dogon lokaci, saboda kswapd da ƙarfin hali yana sakin shafukan ƙwaƙwalwar ajiya, yana zubar da cache na shafin (dukkan faifan I/O a cikin tsarin, a zahiri) zuwa faifai. A cikin ƙarin cikakkun bayanai, tare da bayanin matakan da ake buƙata don kawar da irin waɗannan matsalolin a cikin kwaya, za ku iya karantawa a nan.
A cikin wasu gungu, waɗanda a cikin su akwai ƙwanƙwasa da yawa da ke aiki, mun fara lura cewa yawancinsu suna “tsaye” na dogon lokaci a cikin jihar. Pending, kodayake kwantena Docker da kansu sun riga sun gudana akan nodes kuma ana iya aiki tare da hannu.
Haka kuma, cikin describe babu laifi:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned sphinx-0 to ss-dev-kub07
Normal SuccessfulAttachVolume 1m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
Normal SuccessfulMountVolume 1m kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "sphinx-config"
Normal SuccessfulMountVolume 1m kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "default-token-fzcsf"
Normal SuccessfulMountVolume 49s (x2 over 51s) kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
Normal Pulled 43s kubelet, ss-dev-kub07 Container image "registry.example.com/infra/sphinx-exporter/sphinx-indexer:v1" already present on machine
Normal Created 43s kubelet, ss-dev-kub07 Created container
Normal Started 43s kubelet, ss-dev-kub07 Started container
Normal Pulled 43s kubelet, ss-dev-kub07 Container image "registry.example.com/infra/sphinx/sphinx:v1" already present on machine
Normal Created 42s kubelet, ss-dev-kub07 Created container
Normal Started 42s kubelet, ss-dev-kub07 Started container
Bayan wasu digging, mun yi zato cewa kubelet kawai ba shi da lokaci don aika duk bayanai game da yanayin kwas ɗin da gwajin rayuwa / shirye-shiryen zuwa uwar garken API.
Kuma bayan nazarin taimako, mun sami wadannan sigogi:
--kube-api-qps - QPS to use while talking with kubernetes apiserver (default 5)
--kube-api-burst - Burst to use while talking with kubernetes apiserver (default 10)
--event-qps - If > 0, limit event creations per second to this value. If 0, unlimited. (default 5)
--event-burst - Maximum size of a bursty event records, temporarily allows event records to burst to this number, while still not exceeding event-qps. Only used if --event-qps > 0 (default 10)
--registry-qps - If > 0, limit registry pull QPS to this value.
--registry-burst - Maximum size of bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry-qps. Only used if --registry-qps > 0 (default 10)
Kamar yadda aka gani, Adadin dabi'u kadan ne, kuma a cikin 90% suna rufe duk bukatun ... Duk da haka, a cikin yanayinmu wannan bai isa ba. Saboda haka, mun saita dabi'u masu zuwa:
... kuma muka sake kunna kubelets, bayan haka mun ga hoto mai zuwa a cikin jadawali na kira zuwa uwar garken API:
... kuma a, komai ya fara tashi!
PS
Don taimakonsu wajen tattara kwari da shirya wannan labarin, Ina nuna godiya ta ga yawancin injiniyoyi na kamfaninmu, musamman ga abokin aikinmu daga ƙungiyar R&D Andrey Klimentyev (zuzzurfan).