6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

N'ime ọtụtụ afọ nke iji Kubernetes na-emepụta ihe, anyị achịkọtala ọtụtụ akụkọ na-atọ ụtọ banyere otú ahụhụ dị na ihe dị iche iche nke sistemu si eduga na nsonaazụ na-adịghị mma na / ma ọ bụ enweghị nghọta na-emetụta ọrụ nke arịa na pọd. N'isiokwu a, anyị emeela nhọrọ nke ụfọdụ ndị na-emekarị ma ọ bụ na-adọrọ mmasị. Ọbụna ma ọ bụrụ na ọ dịghị mgbe ị ga-enwe ihu ọma izute ọnọdụ ndị dị otú ahụ, ịgụ banyere akụkọ nchọpụta dị mkpirikpi - karịsịa "aka mbụ" - na-adọrọ mmasị mgbe niile, ọ bụghị ya?...

Akụkọ 1. Supercronic na Docker nghọta

N'otu n'ime ụyọkọ ahụ, anyị na-enweta Docker oyi kpọnwụrụ, nke gbochiri arụ ọrụ nke ụyọkọ ahụ. N'otu oge ahụ, a hụrụ ihe ndị a na ndekọ Docker:

level=error msg="containerd: start init process" error="exit status 2: "runtime/cgo: pthread_create failed: No space left on device
SIGABRT: abort
PC=0x7f31b811a428 m=0

goroutine 0 [idle]:

goroutine 1 [running]:
runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420026768 sp=0xc420026760
runtime.main() /usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200267c0 sp=0xc420026768
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267c8 sp=0xc4200267c0

goroutine 17 [syscall, locked to thread]:
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

…

Ihe kacha amasị anyị gbasara njehie a bụ ozi: pthread_create failed: No space left on device. Ọmụmụ ngwa ngwa akwụkwọ kọwara na Docker enweghị ike ịmegharị usoro, ya mere ọ na-ajụ oyi mgbe ụfọdụ.

Na nlekota oru, foto a dabara na ihe na-eme:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

A na-ahụ ọnọdụ yiri nke ahụ na ọnụ ụzọ ndị ọzọ:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

N'otu ọnụ ọnụ anyị na-ahụ:

root@kube-node-1 ~ # ps auxfww | grep curl -c
19782
root@kube-node-1 ~ # ps auxfww | grep curl | head
root     16688  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     17398  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     16852  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      9473  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      4664  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     30571  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     24113  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     16475  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      7176  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      1090  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>

Ọ tụgharịrị na omume a bụ ihe si na pọd na-arụ ọrụ supercronic (a Go utility nke anyị na-eji na-arụ ọrụ cron na pọd):

 _ docker-containerd-shim 833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 /var/run/docker/libcontainerd/833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 docker-runc
|   _ /usr/local/bin/supercronic -json /crontabs/cron
|       _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true
|       |   _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true -no-pidfile
|       _ [newrelic-daemon] <defunct>
|       _ [curl] <defunct>
|       _ [curl] <defunct>
|       _ [curl] <defunct>
…

Nsogbu bụ nke a: mgbe a na-arụ ọrụ na supercronic, usoro ahụ na-esi na ya pụta enweghị ike ịkwụsị nke ọma, na-atụgharị n'ime zombie.

Примечание: Iji bụrụ nke ziri ezi, a na-emepụta usoro site na ọrụ cron, ma supercronic abụghị usoro init na enweghị ike "ịnabata" usoro nke ụmụ ya zụlitere. Mgbe a na-ebuli akara SIGHUP ma ọ bụ SIGTERM, a naghị enyefe ha na usoro ụmụaka, na-eme ka usoro ụmụaka ghara ịkwụsị ma nọgide na ọnọdụ zombie. Ị nwere ike ịgụkwu gbasara ihe ndị a niile, dịka ọmụmaatụ, na isiokwu dị otú ahụ.

Enwere ụzọ abụọ iji dozie nsogbu:

  1. Dị ka ihe na-arụ ọrụ nwa oge - mụbaa ọnụ ọgụgụ PID na sistemụ n'otu oge n'ime oge:
           /proc/sys/kernel/pid_max (since Linux 2.5.34)
                  This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID).  PIDs greater than this  value  are  not  allo‐
                  cated;  thus, the value in this file also acts as a system-wide limit on the total number of processes and threads.  The default value for this file, 32768, results in the
                  same range of PIDs as on earlier kernels
  2. Ma ọ bụ malite ọrụ na supercronic ọ bụghị ozugbo, kama na-eji otu ihe ahụ tini, nke na-enwe ike ịkwụsị usoro n'ụzọ ziri ezi na ọ bụghị spawn zombies.

Akụkọ nke 2. "Zombie" mgbe ị na-ehichapụ otu

Kubelet malitere iri ọtụtụ CPU:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

Ọ dịghị onye ga-amasị nke a, ya mere anyị na-ejikere onwe anyị zuru oke wee malite imeri nsogbu ahụ. Nsonaazụ nyocha ahụ bụ nke a:

  • Kubelet na-etinye ihe karịrị otu ụzọ n'ụzọ atọ nke oge CPU na-adọta data ebe nchekwa site na otu niile:

    6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

  • N'ime ndepụta nzipu ozi nke ndị nrụpụta kernel ị nwere ike ịchọta mkparịta ụka nke nsogbu. Na nkenke, isi ihe na-agbada na nke a: faịlụ tmpf dị iche iche na ihe ndị ọzọ yiri ya adịghị ewepụ kpamkpam na sistemụ mgbe ihichapụ otu ìgwè, ihe a na-akpọ memcg zombie. N'oge na-adịghị, a ga-ehichapụ ha na cache ibe, mana enwere ọtụtụ ebe nchekwa na ihe nkesa na kernel ahụghị isi ihe na-egbu oge na ihichapụ ha. Ọ bụ ya mere ha ji na-agbakọta. Gịnị mere nke a ọbụna na-eme? Nke a bụ ihe nkesa nwere ọrụ cron nke na-emepụta ọrụ ọhụrụ mgbe niile, yana ya na pọd ọhụrụ. Ya mere, a na-emepụta otu ọhụrụ maka arịa dị n'ime ha, nke a na-ehichapụ n'oge na-adịghị anya.
  • Kedu ihe kpatara cAdvisor na kubelet ji egbu oge dị ukwuu? Nke a dị mfe ịhụ na ogbugbu kacha mfe time cat /sys/fs/cgroup/memory/memory.stat. Ọ bụrụ na igwe dị mma, ọrụ ahụ na-ewe 0,01 sekọnd, mgbe ahụ na cron02 nwere nsogbu ọ na-ewe 1,2 sekọnd. Ihe bụ na cAdvisor, nke na-agụ data sitere na sysfs nke nta nke nta, na-agbalị iburu n'uche ebe nchekwa ejiri na otu zombies.
  • Iji wepụ zombies ike, anyị nwara ikpochapụ cache dị ka akwadoro na LKML: sync; echo 3 > /proc/sys/vm/drop_caches, - mana kernel tụgharịrị gbagwojuru anya ma daa ụgbọ ala ahụ.

Ihe a ga-eme? A na-edozi nsogbu ahụ (eme, na maka nkọwa hụ izipu ozi) na-emelite kernel Linux na ụdị 4.16.

Akụkọ ihe mere eme 3. Systemd na ugwu ya

Ọzọ, kubelet na-eri ọtụtụ ihe onwunwe na ụfọdụ ọnụ, mana oge a ọ na-eri oke ebe nchekwa:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

Ọ tụgharịrị na enwere nsogbu na sistemu eji eme ihe na Ubuntu 16.04, ọ na-eme mgbe ị na-ejikwa ugwu emepụtara maka njikọ. subPath site na ConfigMap ma ọ bụ ihe nzuzo. Mgbe pọd ahụ rụchara ọrụ ya ọrụ sistemu na ugwu ọrụ ya ka dị na sistemu. Ka oge na-aga, ọnụ ọgụgụ buru ibu n'ime ha na-agbakọta. Enwere ọbụna okwu gbasara isiokwu a:

  1. #5916;
  2. kubernets #57345.

... nke ikpeazụ na-ezo aka na PR na systemd: #7811 (okwu dị na sistemu - #7798).

Nsogbu a adịkwaghị na Ubuntu 18.04, ma ọ bụrụ na ịchọrọ ịga n'ihu na-eji Ubuntu 16.04, ị nwere ike ịhụ na anyị na-arụ ọrụ na isiokwu a bara uru.

Ya mere, anyị mere DaemonSet ndị a:

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: systemd-slices-cleaner
  name: systemd-slices-cleaner
  namespace: kube-system
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: systemd-slices-cleaner
  template:
    metadata:
      labels:
        app: systemd-slices-cleaner
    spec:
      containers:
      - command:
        - /usr/local/bin/supercronic
        - -json
        - /app/crontab
        Image: private-registry.org/systemd-slices-cleaner/systemd-slices-cleaner:v0.1.0
        imagePullPolicy: Always
        name: systemd-slices-cleaner
        resources: {}
        securityContext:
          privileged: true
        volumeMounts:
        - name: systemd
          mountPath: /run/systemd/private
        - name: docker
          mountPath: /run/docker.sock
        - name: systemd-etc
          mountPath: /etc/systemd
        - name: systemd-run
          mountPath: /run/systemd/system/
        - name: lsb-release
          mountPath: /etc/lsb-release-host
      imagePullSecrets:
      - name: antiopa-registry
      priorityClassName: cluster-low
      tolerations:
      - operator: Exists
      volumes:
      - name: systemd
        hostPath:
          path: /run/systemd/private
      - name: docker
        hostPath:
          path: /run/docker.sock
      - name: systemd-etc
        hostPath:
          path: /etc/systemd
      - name: systemd-run
        hostPath:
          path: /run/systemd/system/
      - name: lsb-release
        hostPath:
          path: /etc/lsb-release

... ma ọ na-eji edemede a:

#!/bin/bash

# we will work only on xenial
hostrelease="/etc/lsb-release-host"
test -f ${hostrelease} && grep xenial ${hostrelease} > /dev/null || exit 0

# sleeping max 30 minutes to dispense load on kube-nodes
sleep $((RANDOM % 1800))

stoppedCount=0
# counting actual subpath units in systemd
countBefore=$(systemctl list-units | grep subpath | grep "run-" | wc -l)
# let's go check each unit
for unit in $(systemctl list-units | grep subpath | grep "run-" | awk '{print $1}'); do
  # finding description file for unit (to find out docker container, who born this unit)
  DropFile=$(systemctl status ${unit} | grep Drop | awk -F': ' '{print $2}')
  # reading uuid for docker container from description file
  DockerContainerId=$(cat ${DropFile}/50-Description.conf | awk '{print $5}' | cut -d/ -f6)
  # checking container status (running or not)
  checkFlag=$(docker ps | grep -c ${DockerContainerId})
  # if container not running, we will stop unit
  if [[ ${checkFlag} -eq 0 ]]; then
    echo "Stopping unit ${unit}"
    # stoping unit in action
    systemctl stop $unit
    # just counter for logs
    ((stoppedCount++))
    # logging current progress
    echo "Stopped ${stoppedCount} systemd units out of ${countBefore}"
  fi
done

... ọ na-agbakwa nkeji ise ọ bụla site na iji supercronic a kpọtụrụ aha na mbụ. Dockerfile ya dị ka nke a:

FROM ubuntu:16.04
COPY rootfs /
WORKDIR /app
RUN apt-get update && 
    apt-get upgrade -y && 
    apt-get install -y gnupg curl apt-transport-https software-properties-common wget
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" && 
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - && 
    apt-get update && 
    apt-get install -y docker-ce=17.03.0*
RUN wget https://github.com/aptible/supercronic/releases/download/v0.1.6/supercronic-linux-amd64 -O 
    /usr/local/bin/supercronic && chmod +x /usr/local/bin/supercronic
ENTRYPOINT ["/bin/bash", "-c", "/usr/local/bin/supercronic -json /app/crontab"]

Akụkọ 4. Asọmpi mgbe ị na-ahazi pọd

Achọpụtara na: ọ bụrụ na anyị nwere pọd etinye n'ọnụ ọnụ ma wepụ ihe oyiyi ya ruo ogologo oge, mgbe ahụ, pọd ọzọ nke "kụrụ" otu ọnụ ga-adị mfe. anaghị amalite sere onyinyo nke ọhụrụ pod. Kama, ọ na-echere ruo mgbe e wepụrụ ihe oyiyi nke pọd gara aga. N'ihi nke a, pọd nke edozilarị na nke enwere ike ibudata ihe oyiyi ya n'ime otu nkeji ga-ejedebe na ọkwa nke containerCreating.

Ihe omume ga-adị ka nke a:

Normal  Pulling    8m    kubelet, ip-10-241-44-128.ap-northeast-1.compute.internal  pulling image "registry.example.com/infra/openvpn/openvpn:master"

Ọ na-apụta na otu onyonyo sitere na ndekọ ngwa ngwa nwere ike igbochi ntinye kwa ọnụ.

N'ụzọ dị mwute, ọ bụghị ọtụtụ ụzọ isi na ọnọdụ ahụ pụta:

  1. Gbalịa iji ndekọ Docker gị ozugbo na ụyọkọ ma ọ bụ ozugbo na ụyọkọ (dịka ọmụmaatụ, GitLab Registry, Nexus, wdg);
  2. Jiri akụrụngwa dịka kraken.

Akụkọ 5. Nodes na-ekokwasị n'ihi enweghị ebe nchekwa

N'oge ọrụ nke dị iche iche ngwa, anyị na-ezutekwa ọnọdụ ebe a ọnụ kpamkpam akwụsị inweta: SSH adịghị aza, niile nlekota daemons ada oyi, na mgbe ahụ ọ dịghị ihe (ma ọ bụ fọrọ nke nta ka ọ dịghị) anomalous na ndekọ.

Aga m agwa gị na foto site na iji ihe atụ nke otu ọnụ ebe MongoDB rụrụ ọrụ.

Nke a bụ ihe atop dị ka ka ihe mberede:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

Ma dị ka nke a - после ihe mberede:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

Na nlekota oru, enwekwara mwụli elu, nke ọnụ na-akwụsị ịdị:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

Ya mere, site na nseta ihuenyo o doro anya na:

  1. RAM na igwe dị nso na njedebe;
  2. Enwere mwụli elu na oriri RAM, mgbe nke ahụ gasịrị, ohere ịnweta igwe dum nwere nkwarụ na mberede;
  3. Otu nnukwu ọrụ na-abịarute na Mongo, nke na-amanye usoro DBMS iji ebe nchekwa karịa ma na-agụsi ike na diski.

Ọ na-apụta na ọ bụrụ na Linux na-agwụ na ebe nchekwa efu (nrụgide nchekwa na-abanye) na enweghị mgbanwe, mgbe ahụ. ka Mgbe onye na-egbu OOM bịarutere, usoro ngbanwe nwere ike ibilite n'etiti ịtụba ibe n'ime cache ibe wee deghachi ha na diski. A na-eme nke a site na kswapd, nke ji obi ike tọhapụ ọtụtụ ibe ebe nchekwa dị ka o kwere mee maka nkesa na-esote.

O di nwute, na ibu I/O buru ibu tinyere obere ebe nchekwa efu, kswapd na-aghọ ihe mgbochi nke sistemu niile, n'ihi na e kekọtara ha na ya niile oke (mmejọ ibe) nke ibe ebe nchekwa na sistemụ. Nke a nwere ike ịga n'ihu ruo ogologo oge ma ọ bụrụ na usoro ahụ achọghị iji ebe nchekwa ọzọ, mana a na-edozi ya na nsọtụ nke abyss OOM-egbu.

Ajụjụ sitere n'okike bụ: gịnị kpatara onye na-egbu OOM ji abịa n'oge? N'ime nkwuputa ya ugbu a, onye na-egbu OOM dị nnọọ nzuzu: ọ ga-egbu usoro ahụ naanị mgbe mbọ iji nyefee ibe ebe nchekwa dara, ya bụ. ọ bụrụ na mmejọ ibe ahụ ada ada. Nke a anaghị eme ogologo oge, n'ihi na kswapd ji obi ike tọhapụ ibe ebe nchekwa, na-atụfu cache nke ibe (n'ezie diski I/O dum na sistemụ, n'ezie) na diski. Na nkọwa ndị ọzọ, na nkọwa nke usoro achọrọ iji kpochapụ nsogbu ndị dị otú ahụ na kernel, ị nwere ike ịgụ ebe a.

Omume a kwesịrị imeziwanye na Linux kernel 4.6+.

Akụkọ nke 6. Pods na-arapara na steeti chere

N'ụfọdụ ụyọkọ, ebe enwere ọtụtụ pọd na-arụ ọrụ, anyị malitere ịchọpụta na ọtụtụ n'ime ha "na-ekogidere" ogologo oge na steeti ahụ. Pending, ọ bụ ezie na igbe Docker n'onwe ha na-agba ọsọ na ọnụ ma nwee ike iji aka rụọ ọrụ.

Ọzọkwa, na describe ọ dịghị ihe ọjọọ:

  Type    Reason                  Age                From                     Message
  ----    ------                  ----               ----                     -------
  Normal  Scheduled               1m                 default-scheduler        Successfully assigned sphinx-0 to ss-dev-kub07
  Normal  SuccessfulAttachVolume  1m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
  Normal  SuccessfulMountVolume   1m                 kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "sphinx-config"
  Normal  SuccessfulMountVolume   1m                 kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "default-token-fzcsf"
  Normal  SuccessfulMountVolume   49s (x2 over 51s)  kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
  Normal  Pulled                  43s                kubelet, ss-dev-kub07    Container image "registry.example.com/infra/sphinx-exporter/sphinx-indexer:v1" already present on machine
  Normal  Created                 43s                kubelet, ss-dev-kub07    Created container
  Normal  Started                 43s                kubelet, ss-dev-kub07    Started container
  Normal  Pulled                  43s                kubelet, ss-dev-kub07    Container image "registry.example.com/infra/sphinx/sphinx:v1" already present on machine
  Normal  Created                 42s                kubelet, ss-dev-kub07    Created container
  Normal  Started                 42s                kubelet, ss-dev-kub07    Started container

Mgbe ụfọdụ egwu ala, anyị mere echiche na kubelet enweghị oge izipu ozi niile banyere steeti pods na liveness / njikere ule na API nkesa.

Ma mgbe anyị mụsịrị enyemaka, anyị chọtara parampat ndị a:

--kube-api-qps - QPS to use while talking with kubernetes apiserver (default 5)
--kube-api-burst  - Burst to use while talking with kubernetes apiserver (default 10) 
--event-qps - If > 0, limit event creations per second to this value. If 0, unlimited. (default 5)
--event-burst - Maximum size of a bursty event records, temporarily allows event records to burst to this number, while still not exceeding event-qps. Only used if --event-qps > 0 (default 10) 
--registry-qps - If > 0, limit registry pull QPS to this value.
--registry-burst - Maximum size of bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry-qps. Only used if --registry-qps > 0 (default 10)

Dị ka a hụrụ, ụkpụrụ ndabara dị obere, na na 90% ha na-ekpuchi mkpa niile ... Otú ọ dị, n'ọnọdụ anyị nke a ezughị. Ya mere, anyị debere ụkpụrụ ndị a:

--event-qps=30 --event-burst=40 --kube-api-burst=40 --kube-api-qps=30 --registry-qps=30 --registry-burst=40

... wee malitegharịa kubelets, mgbe nke ahụ gasịrị, anyị hụrụ foto a na eserese nke oku na sava API:

6 sistemu sistemu na-atọ ụtọ na arụ ọrụ Kubernetes [na ngwọta ha]

... na ee, ihe niile malitere ife!

PS

Maka enyemaka ha n'ịchịkọta ahụhụ na ịkwadebe akụkọ a, ana m ekwupụta ekele dị ukwuu nye ọtụtụ ndị injinia nke ụlọ ọrụ anyị, yana karịsịa onye ọrụ ibe m site na R&D otu Andrey Klimentyev (zuzzas).

Pps

Gụọkwa na blọọgụ anyị:

isi: www.habr.com

Tinye a comment