Phakathi neminyaka yokusebenzisa i-Kubernetes ekukhiqizeni, siqoqe izindaba eziningi ezithakazelisayo zokuthi izimbungulu ezingxenyeni ezihlukahlukene zesistimu ziholele kanjani emiphumeleni engemnandi kanye/noma engaqondakali ethinta ukusebenza kweziqukathi nama-pods. Kulesi sihloko senze ukukhethwa kwezinye ezivame kakhulu noma ezithakazelisayo. Ngisho noma ungakaze ube nenhlanhla yokuhlangabezana nezimo ezinjalo, ukufunda ngezindaba zabaseshi ezimfushane - ikakhulukazi "izandla" - kuhlala kuthakazelisa, akunjalo?
Indaba 1. I-Supercronic ne-Docker iyalenga
Kwelinye lamaqoqo, ngezikhathi ezithile sasithola i-Docker efriziwe, ephazamisana nokusebenza okuvamile kweqoqo. Ngasikhathi sinye, okulandelayo kubonwe ezingodweni ze-Docker:
level=error msg="containerd: start init process" error="exit status 2: "runtime/cgo: pthread_create failed: No space left on device
SIGABRT: abort
PC=0x7f31b811a428 m=0
goroutine 0 [idle]:
goroutine 1 [running]:
runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420026768 sp=0xc420026760
runtime.main() /usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200267c0 sp=0xc420026768
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267c8 sp=0xc4200267c0
goroutine 17 [syscall, locked to thread]:
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
β¦
Okusithakasela kakhulu ngaleli phutha umlayezo: pthread_create failed: No space left on device
. Isifundo Esisheshayo
Ekuqapheni, isithombe esilandelayo sihambisana nalokho okwenzekayo:
Isimo esifanayo siyabonwa kwamanye ama-node:
Kumanodi afanayo sibona:
root@kube-node-1 ~ # ps auxfww | grep curl -c
19782
root@kube-node-1 ~ # ps auxfww | grep curl | head
root 16688 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 17398 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 16852 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 9473 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 4664 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 30571 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 24113 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 16475 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 7176 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 1090 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
Kuvele ukuthi lokhu kuziphatha kuwumphumela we-pod esebenza nayo
_ docker-containerd-shim 833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 /var/run/docker/libcontainerd/833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 docker-runc
| _ /usr/local/bin/supercronic -json /crontabs/cron
| _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true
| | _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true -no-pidfile
| _ [newrelic-daemon] <defunct>
| _ [curl] <defunct>
| _ [curl] <defunct>
| _ [curl] <defunct>
β¦
Inkinga yilena: lapho umsebenzi wenziwa nge-supercronic, inqubo idalwa yikho ayikwazi ukunqamula kahle, ephenduka
Ukubhala: Ukuze ucacise kabanzi, izinqubo zibangelwa imisebenzi ye-cron, kodwa i-supercronic ayiyona isistimu ye-init futhi ayikwazi "ukwamukela" izinqubo ezivezwe izingane zayo. Uma izimpawu ze-SIGHUP noma ze-SIGTERM ziphakanyiswa, azidluliselwa kuzinqubo zengane, okuholela ekutheni izinqubo zengane zinganqamuki futhi zihlale esimweni se-zombie. Ungafunda kabanzi ngakho konke lokhu, isibonelo, ku
Kunezindlela ezimbalwa zokuxazulula izinkinga:
- Njengendlela yokusebenza yesikhashana - khulisa inani lama-PID ohlelweni ngesikhathi esisodwa:
/proc/sys/kernel/pid_max (since Linux 2.5.34) This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID). PIDs greater than this value are not alloβ cated; thus, the value in this file also acts as a system-wide limit on the total number of processes and threads. The default value for this file, 32768, results in the same range of PIDs as on earlier kernels
- Noma qalisa imisebenzi nge-supercronic hhayi ngokuqondile, kodwa usebenzisa okufanayo
ithini , ekwazi ukunqamula izinqubo ngendlela efanele futhi ingazali ama-zombies.
Indaba 2. "AmaZombi" lapho ususa iqembu
UKubelet waqala ukudla i-CPU eningi:
Akekho ozothanda lokhu, ngakho sahloma
- I-Kubelet isebenzisa ngaphezu kwengxenye yesithathu yesikhathi sayo se-CPU idonsa idatha yenkumbulo kuwo wonke amaqoqo:
- Ohlwini lwe-imeyili lonjiniyela be-kernel ongayithola
ingxoxo ngenkinga . Ngamafuphi, iphuzu lifika kulokhu: amafayela ahlukahlukene we-tmpfs nezinye izinto ezifanayo azisuswanga ngokuphelele ohlelweni lapho ususa iqembu, okuthiwamemcg zombie. Maduze zizosuswa kunqolobane yekhasi, kodwa kunenkumbulo eningi kuseva futhi i-kernel ayiboni iphuzu ekumosheni isikhathi ekuzisuleni. Yingakho belokhu benqwabelanisa. Kungani lokhu ngisho kwenzeka? Lena iseva enemisebenzi ye-cron ehlala idala imisebenzi emisha, kanye nama-pods amasha. Ngakho-ke, ama-cgroups amasha adalelwa iziqukathi ezikuzo, ezisuswa ngokushesha. - Kungani i-cAdvisor ekukubelet imosha isikhathi esingaka? Lokhu kulula ukukubona ngokwenza okulula
time cat /sys/fs/cgroup/memory/memory.stat
. Uma emshinini onempilo ukusebenza kuthatha imizuzwana engu-0,01, khona-ke ku-cron02 eyinkinga kuthatha imizuzwana engu-1,2. Into ukuthi i-cAdvisor, efunda idatha kusuka ku-sysfs kancane kakhulu, izama ukucabangela inkumbulo esetshenziswa kumaqembu e-zombie. - Ukususa ama-Zombies ngamandla, sizame ukusula ama-caches njengoba kunconyiwe ku-LKML:
sync; echo 3 > /proc/sys/vm/drop_caches
, - kodwa i-kernel yaphenduka yaba yinkimbinkimbi futhi yaphahlazeka imoto.
Okufanele ngikwenze? Inkinga iyalungiswa (
Umlando 3. Systemd kanye ukukhweza yayo
Futhi, i-kubelet idla izinsiza eziningi kakhulu kwamanye ama-node, kodwa kulokhu idla inkumbulo eningi kakhulu:
Kuvele ukuthi kunenkinga ku-systemd esetshenziswa ku-Ubuntu 16.04, futhi kwenzeka lapho uphatha izikhwebu ezidalelwe ukuxhumana. subPath
kusuka ku-ConfigMaps noma izimfihlo. Ngemuva kokuthi i-pod iqede umsebenzi wayo isevisi ye-systemd kanye nokukhwezwa kwayo kwesevisi kuhlala ohlelweni. Ngokuhamba kwesikhathi, inani elikhulu lazo liyanqwabelana. Kukhona nezinkinga kulesi sihloko:
...okugcina kwakho kubhekise ku-PR ku-systemd:
Inkinga ayisekho ku-Ubuntu 18.04, kodwa uma ufuna ukuqhubeka nokusebenzisa Ubuntu 16.04, ungathola indlela yethu yokusebenza kulesi sihloko iwusizo.
Ngakho-ke senze i-DaemonSet elandelayo:
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: systemd-slices-cleaner
name: systemd-slices-cleaner
namespace: kube-system
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: systemd-slices-cleaner
template:
metadata:
labels:
app: systemd-slices-cleaner
spec:
containers:
- command:
- /usr/local/bin/supercronic
- -json
- /app/crontab
Image: private-registry.org/systemd-slices-cleaner/systemd-slices-cleaner:v0.1.0
imagePullPolicy: Always
name: systemd-slices-cleaner
resources: {}
securityContext:
privileged: true
volumeMounts:
- name: systemd
mountPath: /run/systemd/private
- name: docker
mountPath: /run/docker.sock
- name: systemd-etc
mountPath: /etc/systemd
- name: systemd-run
mountPath: /run/systemd/system/
- name: lsb-release
mountPath: /etc/lsb-release-host
imagePullSecrets:
- name: antiopa-registry
priorityClassName: cluster-low
tolerations:
- operator: Exists
volumes:
- name: systemd
hostPath:
path: /run/systemd/private
- name: docker
hostPath:
path: /run/docker.sock
- name: systemd-etc
hostPath:
path: /etc/systemd
- name: systemd-run
hostPath:
path: /run/systemd/system/
- name: lsb-release
hostPath:
path: /etc/lsb-release
... futhi isebenzisa umbhalo olandelayo:
#!/bin/bash
# we will work only on xenial
hostrelease="/etc/lsb-release-host"
test -f ${hostrelease} && grep xenial ${hostrelease} > /dev/null || exit 0
# sleeping max 30 minutes to dispense load on kube-nodes
sleep $((RANDOM % 1800))
stoppedCount=0
# counting actual subpath units in systemd
countBefore=$(systemctl list-units | grep subpath | grep "run-" | wc -l)
# let's go check each unit
for unit in $(systemctl list-units | grep subpath | grep "run-" | awk '{print $1}'); do
# finding description file for unit (to find out docker container, who born this unit)
DropFile=$(systemctl status ${unit} | grep Drop | awk -F': ' '{print $2}')
# reading uuid for docker container from description file
DockerContainerId=$(cat ${DropFile}/50-Description.conf | awk '{print $5}' | cut -d/ -f6)
# checking container status (running or not)
checkFlag=$(docker ps | grep -c ${DockerContainerId})
# if container not running, we will stop unit
if [[ ${checkFlag} -eq 0 ]]; then
echo "Stopping unit ${unit}"
# stoping unit in action
systemctl stop $unit
# just counter for logs
((stoppedCount++))
# logging current progress
echo "Stopped ${stoppedCount} systemd units out of ${countBefore}"
fi
done
... futhi isebenza njalo emizuzwini emi-5 kusetshenziswa i-supercronic eshiwo ngaphambilini. I-Dockerfile yayo ibukeka kanje:
FROM ubuntu:16.04
COPY rootfs /
WORKDIR /app
RUN apt-get update &&
apt-get upgrade -y &&
apt-get install -y gnupg curl apt-transport-https software-properties-common wget
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" &&
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - &&
apt-get update &&
apt-get install -y docker-ce=17.03.0*
RUN wget https://github.com/aptible/supercronic/releases/download/v0.1.6/supercronic-linux-amd64 -O
/usr/local/bin/supercronic && chmod +x /usr/local/bin/supercronic
ENTRYPOINT ["/bin/bash", "-c", "/usr/local/bin/supercronic -json /app/crontab"]
Indaba 4. Ukuncintisana lapho uhlela ama-pods
Kwaqashelwa ukuthi: uma sine-pod ebekwe endaweni futhi isithombe sayo siphonswa isikhathi eside kakhulu, khona-ke enye i-pod "eshaya" i-node efanayo izovele ivele. ayiqali ukudonsa isithombe se-pod entsha. Kunalokho, ilinda kuze kudonswe isithombe se-pod yangaphambilini. Ngenxa yalokho, i-pod ebivele ihleliwe futhi isithombe sayo ebesingalandwa emzuzwini nje sizogcina sisesimweni sokuthi containerCreating
.
Imicimbi izobukeka kanje:
Normal Pulling 8m kubelet, ip-10-241-44-128.ap-northeast-1.compute.internal pulling image "registry.example.com/infra/openvpn/openvpn:master"
Kuvela lokho isithombe esisodwa esivela kurejista enensa singavimba ukusetshenziswa ngenodi.
Ngeshwa, azikho izindlela eziningi zokuphuma kulesi simo:
- Zama ukusebenzisa i-Docker Registry yakho ngqo ku-cluster noma ngqo ne-cluster (isibonelo, i-GitLab Registry, i-Nexus, njll.);
- Sebenzisa izinsiza ezifana
Kraken .
Indaba 5. Amanodi ayalenga ngenxa yokuntula inkumbulo
Ngesikhathi sokusebenza kwezinhlelo zokusebenza ezahlukahlukene, siphinde sahlangabezana nesimo lapho i-node ingasatholakali ngokuphelele: I-SSH ayiphenduli, wonke ama-daemons okuqapha ayawa, bese kungekho lutho (noma cishe lutho) oluxakile kulogi.
Ngizokutshela ezithombeni ngisebenzisa isibonelo se-node eyodwa lapho i-MongoDB yayisebenza khona.
Lokhu kubukeka ku-atop ukuze izingozi:
Futhi kanjena - ΠΏΠΎΡΠ»Π΅ izingozi:
Ekuqapheni, kukhona futhi ukugxuma okubukhali, lapho i-node ingatholakali khona:
Ngakho-ke, kusukela kuzithombe-skrini kuyacaca ukuthi:
- I-RAM emshinini isiseduze nokuphela;
- Kukhona ukweqa okubukhali ekusetshenzisweni kwe-RAM, emva kwalokho ukufinyelela kuwo wonke umshini kukhutshazwa kungazelelwe;
- Umsebenzi omkhulu ufika ku-Mongo, ophoqelela inqubo ye-DBMS ukuthi isebenzise inkumbulo eyengeziwe futhi ifunde ngokukhuthele kudiski.
Kuvela ukuthi uma i-Linux iphela inkumbulo yamahhala (ingcindezi yememori ingena) futhi kungabikho ukushintshwa, khona-ke ukuze Lapho umbulali we-OOM efika, isenzo sokulinganisa singase siphakame phakathi kokujikijela amakhasi kunqolobane yekhasi futhi uwabhale uwabuyisele kudiski. Lokhu kwenziwa yi-kswapd, ekhulula ngesibindi amakhasi enkumbulo amaningi ngangokunokwenzeka ukuze asatshalaliswe ngokulandelayo.
Ngeshwa, ngomthwalo omkhulu we-I/O ohlanganiswe nenani elincane lenkumbulo yamahhala, kswapd iba ibhodlela lalo lonke uhlelo, ngoba ziboshelwe kuyo konke ukwaba (amaphutha ekhasi) amakhasi ememori ohlelweni. Lokhu kungaqhubeka isikhathi eside kakhulu uma izinqubo zingasafuni ukusebenzisa inkumbulo, kodwa zimiswe emaphethelweni alasha okubulala i-OOM.
Umbuzo wemvelo uthi: kungani umbulali we-OOM efika sekwephuzile kangaka? Ekuphindaphindeni kwayo kwamanje, umbulali we-OOM uyisiphukuphuku ngokwedlulele: izobulala inqubo kuphela lapho umzamo wokwaba ikhasi lememori uhluleka, i.e. uma iphutha lekhasi lehluleka. Lokhu akwenzeki isikhathi eside impela, ngoba i-kswapd ikhulula ngesibindi amakhasi enkumbulo, ilahle inqolobane yekhasi (yonke idiski I/O ohlelweni, empeleni) ibuyele kudiski. Ngemininingwane eyengeziwe, ngencazelo yezinyathelo ezidingekayo zokuqeda izinkinga ezinjalo ku-kernel, ungafunda
Lokhu kuziphatha
Indaba 6. Ama-pods abhajwa esimweni esilindile
Kwamanye amaqoqo, lapho kukhona ama-pods amaningi asebenzayo, saqala ukuqaphela ukuthi iningi lawo "lilenga" isikhathi eside kakhulu esifundazweni. Pending
, nakuba iziqukathi ze-Docker ngokwazo sezivele zisebenza kuma-node futhi zingasetshenzwa ngesandla.
Ngesikhathi esifanayo, ku describe
akukho lutho olungalungile:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned sphinx-0 to ss-dev-kub07
Normal SuccessfulAttachVolume 1m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
Normal SuccessfulMountVolume 1m kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "sphinx-config"
Normal SuccessfulMountVolume 1m kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "default-token-fzcsf"
Normal SuccessfulMountVolume 49s (x2 over 51s) kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
Normal Pulled 43s kubelet, ss-dev-kub07 Container image "registry.example.com/infra/sphinx-exporter/sphinx-indexer:v1" already present on machine
Normal Created 43s kubelet, ss-dev-kub07 Created container
Normal Started 43s kubelet, ss-dev-kub07 Started container
Normal Pulled 43s kubelet, ss-dev-kub07 Container image "registry.example.com/infra/sphinx/sphinx:v1" already present on machine
Normal Created 42s kubelet, ss-dev-kub07 Created container
Normal Started 42s kubelet, ss-dev-kub07 Started container
Ngemuva kokumba okuthile, senze ukucabangela ukuthi i-kubelet ayinaso isikhathi sokuthumela lonke ulwazi mayelana nesimo sama-pods kanye nokuhlolwa kokuphila / ukulungela kuseva ye-API.
Futhi ngemva kokutadisha usizo, sithole amapharamitha alandelayo:
--kube-api-qps - QPS to use while talking with kubernetes apiserver (default 5)
--kube-api-burst - Burst to use while talking with kubernetes apiserver (default 10)
--event-qps - If > 0, limit event creations per second to this value. If 0, unlimited. (default 5)
--event-burst - Maximum size of a bursty event records, temporarily allows event records to burst to this number, while still not exceeding event-qps. Only used if --event-qps > 0 (default 10)
--registry-qps - If > 0, limit registry pull QPS to this value.
--registry-burst - Maximum size of bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry-qps. Only used if --registry-qps > 0 (default 10)
Njengoba kubonakala, amanani azenzakalelayo mancane kakhulu, futhi ku-90% bahlanganisa zonke izidingo ... Nokho, esimweni sethu lokhu kwakunganele. Ngakho-ke, sibeka amanani alandelayo:
--event-qps=30 --event-burst=40 --kube-api-burst=40 --kube-api-qps=30 --registry-qps=30 --registry-burst=40
... futhi saqala kabusha ama-kubelets, ngemuva kwalokho sabona isithombe esilandelayo kumagrafu wezingcingo eziya kuseva ye-API:
... futhi yebo, yonke into yaqala ukundiza!
PS
Ngosizo lwabo ekuqoqeni izimbungulu nasekulungiseleleni lesi sihloko, ngidlulisa ukubonga kwami ββokujulile konjiniyela abaningi benkampani yethu, futhi ikakhulukazi kozakwethu eqenjini lethu le-R&D u-Andrey Klimentyev (
I-PPS
Funda futhi kubhulogi yethu:
- Β«
i-plugin ye-kubectl-debug yokususa iphutha kuma-pods e-Kubernetes ". - Amathiphu namasu we-Kubernetes loop:
Source: www.habr.com