![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/bed059552ed86580939aa18fbdf1553e.jpg)
Sajrone pirang-pirang taun nggunakake Kubernetes ing produksi, kita wis nglumpukake akeh crita menarik babagan carane kewan omo ing macem-macem komponen sistem nyebabake akibat sing ora nyenengake lan / utawa ora bisa dingerteni sing mengaruhi operasi wadhah lan pods. Ing artikel iki kita wis nggawe pilihan saka sawetara sing paling umum utawa menarik. Sanajan sampeyan ora nate nemoni kahanan kaya ngono, maca babagan crita detektif cekak - utamane "tangan pertama" - mesthi menarik, ta?
Crita 1. Supercronic lan Docker hanging
Ing salah sawijining klompok, kita nampa Docker beku kanthi periodik, sing ngganggu fungsi normal kluster kasebut. Ing wektu sing padha, ing ngisor iki diamati ing log Docker:
level=error msg="containerd: start init process" error="exit status 2: "runtime/cgo: pthread_create failed: No space left on device
SIGABRT: abort
PC=0x7f31b811a428 m=0
goroutine 0 [idle]:
goroutine 1 [running]:
runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420026768 sp=0xc420026760
runtime.main() /usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200267c0 sp=0xc420026768
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267c8 sp=0xc4200267c0
goroutine 17 [syscall, locked to thread]:
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1
… Sing paling menarik kanggo kita babagan kesalahan iki yaiku pesen: pthread_create failed: No space left on device. Sinau cepet nerangake manawa Docker ora bisa nggawe proses, mula dheweke beku kanthi periodik.
Ing ngawasi, gambar ing ngisor iki cocog karo apa sing kedadeyan:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/bd778052c87b338493bae54b26830ef3.jpg)
Kahanan sing padha diamati ing simpul liyane:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/ef512532a95ca982e4342071115dbe9f.jpg)
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/43c32ebca78755dde348ed5e7ac75c79.jpg)
Ing simpul sing padha kita weruh:
root@kube-node-1 ~ # ps auxfww | grep curl -c
19782
root@kube-node-1 ~ # ps auxfww | grep curl | head
root 16688 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 17398 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 16852 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 9473 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 4664 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 30571 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 24113 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 16475 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 7176 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>
root 1090 0.0 0.0 0 0 ? Z Feb06 0:00 | _ [curl] <defunct>Pranyata prilaku iki minangka akibat saka polong sing digunakake (utilitas Go sing digunakake kanggo mbukak proyek cron ing pods):
_ docker-containerd-shim 833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 /var/run/docker/libcontainerd/833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 docker-runc
| _ /usr/local/bin/supercronic -json /crontabs/cron
| _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true
| | _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true -no-pidfile
| _ [newrelic-daemon] <defunct>
| _ [curl] <defunct>
| _ [curl] <defunct>
| _ [curl] <defunct>
…Masalahe yaiku: nalika tugas ditindakake ing supercronic, proses kasebut diwiwiti ora bisa mungkasi kanthi bener, dadi .
komentar: Kanggo luwih tepat, proses ngasilake tugas cron, nanging supercronic dudu sistem init lan ora bisa "ngadopsi" proses sing diturunake anak-anake. Nalika sinyal SIGHUP utawa SIGTERM wungu, padha ora liwati ing proses anak, asil ing proses anak ora terminating lan tetep ing status zombie. Sampeyan bisa maca liyane babagan kabeh iki, contone, ing .
Ana sawetara cara kanggo ngatasi masalah:
- Minangka solusi sauntara - tambahake jumlah PID ing sistem ing siji wektu:
/proc/sys/kernel/pid_max (since Linux 2.5.34) This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID). PIDs greater than this value are not allo‐ cated; thus, the value in this file also acts as a system-wide limit on the total number of processes and threads. The default value for this file, 32768, results in the same range of PIDs as on earlier kernels - Utawa miwiti tugas ing supercronic ora langsung, nanging nggunakake padha , sing bisa mungkasi proses kanthi bener lan ora ngasilake nir.
Crita 2. "Zombi" nalika mbusak cgroup
Kubelet wiwit ngonsumsi akeh CPU:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/6140058330faaa3785b089dcba857056.jpg)
Ora ana wong sing seneng iki, mula kita bersenjata lan wiwit menehi hasil karo masalah. Asil investigasi kaya ing ngisor iki:
- Kubelet nglampahi luwih saka katelu wektu CPU kanggo narik data memori saka kabeh cgroups:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20600%20241'%3E%3C/svg%3E)
- Ing dhaptar mailing pangembang kernel sampeyan bisa nemokake . Ing ringkesan, titik kasebut dadi: macem-macem file tmpfs lan liyane sing padha ora dibusak saka sistem nalika mbusak cgroup, sing disebut Zombi. Cepet utawa mengko padha bakal dibusak saka cache kaca, nanging ana akeh memori ing server lan kernel ora weruh titik ing mbuang wektu kanggo mbusak mau. Mulane padha numpuk. Yagene iki malah kedadeyan? Iki minangka server karo proyek cron sing terus-terusan nggawe proyek anyar, lan karo polong anyar. Mangkono, cgroups anyar digawe kanggo kontaner ing wong, kang enggal dibusak.
- Napa cAdvisor ing kubelet mbuwang wektu akeh? Iki gampang dideleng kanthi eksekusi sing paling gampang
time cat /sys/fs/cgroup/memory/memory.stat. Yen ing mesin sehat operasi njupuk 0,01 detik, banjur ing cron02 masalah njupuk 1,2 detik. Ing bab iku cAdvisor, kang maca data saka sysfs alon banget, nyoba kanggo njupuk menyang akun memori digunakake ing Zombi cgroups. - Kanggo mbusak zombie kanthi kuat, kita nyoba mbusak cache kaya sing disaranake ing LKML:
sync; echo 3 > /proc/sys/vm/drop_caches, - nanging kernel dadi luwih rumit lan nabrak mobil.
Apa sing kudu ditindakake? Masalahe lagi ditanggulangi (, lan kanggo katrangan ndeleng ) nganyari kernel Linux nganti versi 4.16.
History 3. Systemd lan sawijining gunung
Maneh, kubelet nggunakake akeh sumber daya ing sawetara simpul, nanging wektu iki akeh banget memori:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/044c4e23a772c61a6206b9b20aa67c1d.jpg)
Jebul ana masalah karo systemd sing digunakake ing Ubuntu 16.04, lan kedadeyan nalika ngatur mount sing digawe kanggo sambungan subPath saka ConfigMap utawa rahasia. Sawise polong rampung karyane layanan systemd lan layanan gunung tetep ing sistem. Swara wektu, akeh wong nglumpukake. Malah ana masalah ing topik iki:
- ;
- .
... sing pungkasan nuduhake PR ing systemd: (masalah ing systemd- ).
Masalahe wis ora ana maneh Ubuntu 18.04, nanging yen sampeyan pengin terus nggunakake Ubuntu 16.04, sampeyan bisa uga nemokake solusi kita babagan topik iki migunani.
Dadi, kita nggawe DaemonSet ing ngisor iki:
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: systemd-slices-cleaner
name: systemd-slices-cleaner
namespace: kube-system
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: systemd-slices-cleaner
template:
metadata:
labels:
app: systemd-slices-cleaner
spec:
containers:
- command:
- /usr/local/bin/supercronic
- -json
- /app/crontab
Image: private-registry.org/systemd-slices-cleaner/systemd-slices-cleaner:v0.1.0
imagePullPolicy: Always
name: systemd-slices-cleaner
resources: {}
securityContext:
privileged: true
volumeMounts:
- name: systemd
mountPath: /run/systemd/private
- name: docker
mountPath: /run/docker.sock
- name: systemd-etc
mountPath: /etc/systemd
- name: systemd-run
mountPath: /run/systemd/system/
- name: lsb-release
mountPath: /etc/lsb-release-host
imagePullSecrets:
- name: antiopa-registry
priorityClassName: cluster-low
tolerations:
- operator: Exists
volumes:
- name: systemd
hostPath:
path: /run/systemd/private
- name: docker
hostPath:
path: /run/docker.sock
- name: systemd-etc
hostPath:
path: /etc/systemd
- name: systemd-run
hostPath:
path: /run/systemd/system/
- name: lsb-release
hostPath:
path: /etc/lsb-release... lan nggunakake skrip ing ngisor iki:
#!/bin/bash
# we will work only on xenial
hostrelease="/etc/lsb-release-host"
test -f ${hostrelease} && grep xenial ${hostrelease} > /dev/null || exit 0
# sleeping max 30 minutes to dispense load on kube-nodes
sleep $((RANDOM % 1800))
stoppedCount=0
# counting actual subpath units in systemd
countBefore=$(systemctl list-units | grep subpath | grep "run-" | wc -l)
# let's go check each unit
for unit in $(systemctl list-units | grep subpath | grep "run-" | awk '{print $1}'); do
# finding description file for unit (to find out docker container, who born this unit)
DropFile=$(systemctl status ${unit} | grep Drop | awk -F': ' '{print $2}')
# reading uuid for docker container from description file
DockerContainerId=$(cat ${DropFile}/50-Description.conf | awk '{print $5}' | cut -d/ -f6)
# checking container status (running or not)
checkFlag=$(docker ps | grep -c ${DockerContainerId})
# if container not running, we will stop unit
if [[ ${checkFlag} -eq 0 ]]; then
echo "Stopping unit ${unit}"
# stoping unit in action
systemctl stop $unit
# just counter for logs
((stoppedCount++))
# logging current progress
echo "Stopped ${stoppedCount} systemd units out of ${countBefore}"
fi
done... lan mlaku saben 5 menit nggunakake supercronic kasebut sadurunge. Dockerfile katon kaya iki:
FROM ubuntu:16.04
COPY rootfs /
WORKDIR /app
RUN apt-get update &&
apt-get upgrade -y &&
apt-get install -y gnupg curl apt-transport-https software-properties-common wget
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" &&
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - &&
apt-get update &&
apt-get install -y docker-ce=17.03.0*
RUN wget https://github.com/aptible/supercronic/releases/download/v0.1.6/supercronic-linux-amd64 -O
/usr/local/bin/supercronic && chmod +x /usr/local/bin/supercronic
ENTRYPOINT ["/bin/bash", "-c", "/usr/local/bin/supercronic -json /app/crontab"]Crita 4. Daya saing nalika jadwal pods
Katon yen: yen kita duwe polong sing diselehake ing simpul lan gambare dipompa metu kanggo wektu sing suwe, banjur polong liyane sing "ngecet" simpul sing padha mung bakal ora miwiti kanggo narik gambar polong anyar. Nanging, ngenteni nganti gambar polong sadurunge ditarik. Akibaté, pod sing wis dijadwal lan gambar sing bisa diundhuh mung sak menit bakal dadi status containerCreating.
Acara bakal katon kaya iki:
Normal Pulling 8m kubelet, ip-10-241-44-128.ap-northeast-1.compute.internal pulling image "registry.example.com/infra/openvpn/openvpn:master"Ternyata sing gambar siji saka pendaptaran alon bisa mblokir penyebaran prajurit saben simpul.
Sayange, ora akeh cara metu saka kahanan:
- Coba gunakake Registry Docker sampeyan langsung ing kluster utawa langsung karo kluster (contone, GitLab Registry, Nexus, lsp.);
- Gunakake utilitas kayata .
Crita 5. Simpul macet amarga kurang memori
Sajrone operasi saka macem-macem aplikasi, kita uga nemoni kahanan ing ngendi simpul rampung ora bisa diakses: SSH ora nanggapi, kabeh daemon ngawasi tiba, banjur ora ana apa-apa (utawa meh ora ana) anomali ing log.
Aku bakal pitutur marang kowe ing gambar nggunakake conto siji simpul ngendi MongoDB functioned.
Iki sing katon ing ndhuwur kanggo kacilakan:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/5de916d270a862cbcbb5ed23c31f698e.jpg)
Lan kaya iki - после kacilakan:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/0f32bf1113204cf19f4639a297e40348.jpg)
Ing ngawasi, uga ana lompat sing cetha, ing ngendi simpul ora kasedhiya:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/31e770cac5be32bb7f95cfbbc6b9f1ae.jpg)
Dadi, saka gambar kasebut jelas yen:
- RAM ing mesin cedhak mburi;
- Ana lompat cetha ing konsumsi RAM, sawise akses menyang kabeh mesin abruptly dipatèni;
- A tugas gedhe teka ing Mongo, sing meksa proses DBMS nggunakake memori luwih akeh lan aktif maca saka disk.
Ternyata yen ing Linux memori kosong entek (tekanan memori kedadeyan) lan ora ana swap, mula kanggo Nalika pembunuh OOM teka, tumindak imbangan bisa uga ana ing antarane mbuwang kaca menyang cache kaca lan nulis maneh menyang disk. Iki rampung dening kswapd, sing wani mbebasake akeh kaca memori sabisa kanggo distribusi sakteruse.
Sayange, kanthi beban I/O sing gedhe ditambah karo jumlah memori gratis sing sithik, kswapd dadi bottleneck saka kabeh sistem, amarga padha disambungake menyang kabeh alokasi (kaca faults) saka kaca memori ing sistem. Iki bisa kanggo dangu banget yen pangolahan ora pengin nggunakake memori maneh, nanging tetep ing pojok banget saka OOM-killer abyss.
Pitakonan alami yaiku: kenapa pembunuh OOM teka telat? Ing pengulangan saiki, pembunuh OOM banget bodho: bakal mateni proses kasebut mung nalika upaya kanggo ngalokasi kaca memori gagal, yaiku. yen kesalahan kaca gagal. Iki ora kelakon cukup suwe, amarga kswapd wani mbebasake kaca memori, mbuwang cache kaca (kabeh disk I / O ing sistem, nyatane) bali menyang disk. Kanthi luwih rinci, kanthi katrangan babagan langkah-langkah sing dibutuhake kanggo ngilangi masalah kasebut ing kernel, sampeyan bisa maca .
Kelakuane iki kanthi inti Linux 4.6 +.
Crita 6. Pods macet ing negara Pending
Ing sawetara kluster, sing ana akeh pods sing operasi, kita wiwit sok dong mirsani sing paling akeh "nyumerepi" kanggo dangu banget ing negara. Pending, sanajan wadhah Docker dhewe wis mlaku ing simpul lan bisa digarap kanthi manual.
Kajaba iku, ing describe ora ana sing salah:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned sphinx-0 to ss-dev-kub07
Normal SuccessfulAttachVolume 1m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
Normal SuccessfulMountVolume 1m kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "sphinx-config"
Normal SuccessfulMountVolume 1m kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "default-token-fzcsf"
Normal SuccessfulMountVolume 49s (x2 over 51s) kubelet, ss-dev-kub07 MountVolume.SetUp succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
Normal Pulled 43s kubelet, ss-dev-kub07 Container image "registry.example.com/infra/sphinx-exporter/sphinx-indexer:v1" already present on machine
Normal Created 43s kubelet, ss-dev-kub07 Created container
Normal Started 43s kubelet, ss-dev-kub07 Started container
Normal Pulled 43s kubelet, ss-dev-kub07 Container image "registry.example.com/infra/sphinx/sphinx:v1" already present on machine
Normal Created 42s kubelet, ss-dev-kub07 Created container
Normal Started 42s kubelet, ss-dev-kub07 Started containerSawise sawetara ngeduk, kita nggawe asumsi sing kubelet mung ora duwe wektu kanggo ngirim kabeh informasi bab negara pods lan liveness / tes kesiapan kanggo server API.
Lan sawise sinau bantuan, kita nemokake paramèter ing ngisor iki:
--kube-api-qps - QPS to use while talking with kubernetes apiserver (default 5)
--kube-api-burst - Burst to use while talking with kubernetes apiserver (default 10)
--event-qps - If > 0, limit event creations per second to this value. If 0, unlimited. (default 5)
--event-burst - Maximum size of a bursty event records, temporarily allows event records to burst to this number, while still not exceeding event-qps. Only used if --event-qps > 0 (default 10)
--registry-qps - If > 0, limit registry pull QPS to this value.
--registry-burst - Maximum size of bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry-qps. Only used if --registry-qps > 0 (default 10)Kaya sing katon, nilai standar cukup cilik, lan ing 90% padha nutupi kabeh kabutuhan ... Nanging, ing kasus kita iki ora cukup. Mulane, kita nyetel nilai ing ngisor iki:
--event-qps=30 --event-burst=40 --kube-api-burst=40 --kube-api-qps=30 --registry-qps=30 --registry-burst=40... lan miwiti maneh kubelets, sawise kita ndeleng gambar ing ngisor iki ing grafik telpon menyang server API:
![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/b2ae099729e55a686f6bec3012b96195.jpg)
... lan ya, kabeh wiwit mabur!
PS
Kanggo bantuan ing ngumpulake kewan omo lan nyiapake artikel iki, aku matur nuwun banget kanggo akeh insinyur perusahaan kita, lan utamane kanggo kolega saka tim R&D kita Andrey Klimentyev ().
PPS
Waca uga ing blog kita:
- «".
- Kubernetes tips & tricks loop:
- «";
- «";
- «";
- «".
Source: www.habr.com

![6 bug sistem sing nyenengake ing operasi Kubernetes [lan solusine]](/wp-content/uploads/2019/03/0d15d1de17cd6838fc1cad19615af218.jpg)