6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

A cikin shekarun da ake amfani da Kubernetes a cikin samarwa, mun tara labarai masu ban sha'awa da yawa na yadda kwari a cikin sassan tsarin daban-daban suka haifar da mummunan sakamako da / ko rashin fahimta da ke shafar aikin kwantena da kwasfa. A cikin wannan labarin mun yi zaɓi na wasu na kowa ko ban sha'awa. Ko da ba ka taɓa samun sa'a don fuskantar irin waɗannan yanayi ba, karanta game da irin waɗannan gajerun labarun bincike - musamman "hannun farko" - koyaushe yana da ban sha'awa, ko ba haka ba?

Labari na 1. Supercronic da Docker rataye

A ɗaya daga cikin gungu, mun karɓi Docker daskararre lokaci-lokaci, wanda ke yin katsalandan ga aikin yau da kullun na gungu. A lokaci guda, an lura da waɗannan abubuwan a cikin rajistan ayyukan Docker:

level=error msg="containerd: start init process" error="exit status 2: "runtime/cgo: pthread_create failed: No space left on device
SIGABRT: abort
PC=0x7f31b811a428 m=0

goroutine 0 [idle]:

goroutine 1 [running]:
runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420026768 sp=0xc420026760
runtime.main() /usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200267c0 sp=0xc420026768
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267c8 sp=0xc4200267c0

goroutine 17 [syscall, locked to thread]:
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

…

Abin da ya fi sha'awar mu game da wannan kuskuren shine saƙon: pthread_create failed: No space left on device. Saurin Karatu takardun ya bayyana cewa Docker ba zai iya cokali mai yatsa ba, wanda shine dalilin da ya sa yakan daskare lokaci-lokaci.

A cikin saka idanu, hoto mai zuwa yayi daidai da abin da ke faruwa:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

Ana lura da irin wannan yanayin akan wasu nodes:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

A cikin nodes iri ɗaya muna gani:

root@kube-node-1 ~ # ps auxfww | grep curl -c
19782
root@kube-node-1 ~ # ps auxfww | grep curl | head
root     16688  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     17398  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     16852  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      9473  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      4664  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     30571  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     24113  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     16475  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      7176  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      1090  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>

Ya juya cewa wannan hali shine sakamakon kwaf ɗin aiki tare da supercronic (a Go mai amfani wanda muke amfani dashi don gudanar da ayyukan cron a cikin kwasfa):

 _ docker-containerd-shim 833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 /var/run/docker/libcontainerd/833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 docker-runc
|   _ /usr/local/bin/supercronic -json /crontabs/cron
|       _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true
|       |   _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true -no-pidfile
|       _ [newrelic-daemon] <defunct>
|       _ [curl] <defunct>
|       _ [curl] <defunct>
|       _ [curl] <defunct>
…

Matsalar ita ce: lokacin da aka gudanar da aiki a cikin supercronic, tsarin ya haifar da shi ba zai iya ƙarewa daidai ba, juyawa zuwa aljan.

Примечание: Don zama madaidaici, ana haifar da matakai ta hanyar ayyukan cron, amma supercronic ba tsarin init ba ne kuma ba zai iya "karɓa" hanyoyin da 'ya'yansa suka haifa ba. Lokacin da aka tayar da sigina na SIGHUP ko SIGTERM, ba a ba da su ga tsarin yara ba, wanda ke haifar da tsarin yaron ba ya ƙare kuma ya kasance cikin matsayi na aljan. Kuna iya karanta ƙarin game da duk waɗannan, misali, a cikin irin wannan labarin.

Akwai hanyoyi guda biyu don magance matsalolin:

  1. A matsayin aikin wucin gadi - ƙara yawan PIDs a cikin tsarin a lokaci guda:
           /proc/sys/kernel/pid_max (since Linux 2.5.34)
                  This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID).  PIDs greater than this  value  are  not  allo‐
                  cated;  thus, the value in this file also acts as a system-wide limit on the total number of processes and threads.  The default value for this file, 32768, results in the
                  same range of PIDs as on earlier kernels
  2. Ko kaddamar da ayyuka a cikin supercronic ba kai tsaye ba, amma ta amfani da iri ɗaya tini, wanda zai iya kawo karshen tafiyar matakai daidai kuma ba spawn aljanu.

Labari na 2. "Aljanu" lokacin share rukuni

Kubelet ya fara cin CPU mai yawa:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

Ba wanda zai so wannan, don haka muka yi wa kanmu makamai cikakke kuma ya fara magance matsalar. Sakamakon binciken ya kasance kamar haka.

  • Kubelet yana ciyarwa fiye da kashi uku na lokacin CPU ɗin sa yana jan bayanan ƙwaƙwalwar ajiya daga duk ƙungiyoyi:

    6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

  • A cikin jerin aikawasiku masu haɓaka kernel za ku iya samu tattauna matsalar. A taqaice dai maganar ta gangaro zuwa ga haka: Fayilolin tmpfs daban-daban da sauran abubuwa makamantan ba a cire su gaba ɗaya daga tsarin lokacin share rukuni, abin da ake kira memcg aljan. Ba dade ko ba dade za a share su daga cache na shafin, amma akwai ƙwaƙwalwar ajiya da yawa akan uwar garken kuma kernel bai ga ma'anar ɓata lokaci akan goge su ba. Shi ya sa suke ta tari. Me yasa hakan ma yake faruwa? Wannan uwar garken ne tare da ayyukan cron wanda ke haifar da sababbin ayyuka kullum, kuma tare da su sababbin kwasfa. Don haka, an ƙirƙiri sabbin ƙungiyoyi don kwantena a cikinsu, waɗanda ba da daɗewa ba za a share su.
  • Me yasa cAdvisor a kubelet ke bata lokaci mai yawa? Wannan yana da sauƙin gani tare da mafi sauƙin kisa time cat /sys/fs/cgroup/memory/memory.stat. Idan akan na'ura mai lafiya aikin yana ɗaukar 0,01 seconds, to akan cron02 mai matsala yana ɗaukar 1,2 seconds. Abun shine cAdvisor, wanda ke karanta bayanai daga sysfs a hankali, yayi ƙoƙarin yin la'akari da ƙwaƙwalwar ajiyar da ake amfani da ita a cikin ƙungiyoyin aljanu.
  • Don cire aljanu da ƙarfi, mun gwada share caches kamar yadda aka ba da shawarar a cikin LKML: sync; echo 3 > /proc/sys/vm/drop_caches, - amma kernel ɗin ya zama mai rikitarwa kuma ya fado da motar.

Me za a yi? Ana gyara matsalar (aikata, kuma ga bayanin duba sako sako) sabunta Linux kernel zuwa sigar 4.16.

Tarihi 3. Systemd da dutsensa

Bugu da ƙari, kubelet yana cin albarkatu da yawa akan wasu nodes, amma wannan lokacin yana cinye ƙwaƙwalwar ajiya da yawa:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

Ya bayyana cewa akwai matsala a cikin tsarin da ake amfani da shi a cikin Ubuntu 16.04, kuma yana faruwa lokacin sarrafa matakan da aka ƙirƙira don haɗi. subPath daga ConfigMap's ko sirrin. Bayan kwafsa ya gama aikinsa Sabis ɗin systemd da tsaunin sabis ɗin sa sun kasance a cikin tsarin. Bayan lokaci, adadi mai yawa daga cikinsu yana taruwa. Akwai ma batutuwa akan wannan batu:

  1. #5916;
  2. kubernetes #57345.

... na ƙarshe wanda ke nufin PR a cikin tsarin: #7811 (matsaloli a cikin tsarin) #7798).

Matsalar ba ta wanzu a cikin Ubuntu 18.04, amma idan kuna son ci gaba da amfani da Ubuntu 16.04, kuna iya samun aikin mu akan wannan batu yana da amfani.

Don haka mun sanya DaemonSet mai zuwa:

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: systemd-slices-cleaner
  name: systemd-slices-cleaner
  namespace: kube-system
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: systemd-slices-cleaner
  template:
    metadata:
      labels:
        app: systemd-slices-cleaner
    spec:
      containers:
      - command:
        - /usr/local/bin/supercronic
        - -json
        - /app/crontab
        Image: private-registry.org/systemd-slices-cleaner/systemd-slices-cleaner:v0.1.0
        imagePullPolicy: Always
        name: systemd-slices-cleaner
        resources: {}
        securityContext:
          privileged: true
        volumeMounts:
        - name: systemd
          mountPath: /run/systemd/private
        - name: docker
          mountPath: /run/docker.sock
        - name: systemd-etc
          mountPath: /etc/systemd
        - name: systemd-run
          mountPath: /run/systemd/system/
        - name: lsb-release
          mountPath: /etc/lsb-release-host
      imagePullSecrets:
      - name: antiopa-registry
      priorityClassName: cluster-low
      tolerations:
      - operator: Exists
      volumes:
      - name: systemd
        hostPath:
          path: /run/systemd/private
      - name: docker
        hostPath:
          path: /run/docker.sock
      - name: systemd-etc
        hostPath:
          path: /etc/systemd
      - name: systemd-run
        hostPath:
          path: /run/systemd/system/
      - name: lsb-release
        hostPath:
          path: /etc/lsb-release

... kuma yana amfani da rubutun kamar haka:

#!/bin/bash

# we will work only on xenial
hostrelease="/etc/lsb-release-host"
test -f ${hostrelease} && grep xenial ${hostrelease} > /dev/null || exit 0

# sleeping max 30 minutes to dispense load on kube-nodes
sleep $((RANDOM % 1800))

stoppedCount=0
# counting actual subpath units in systemd
countBefore=$(systemctl list-units | grep subpath | grep "run-" | wc -l)
# let's go check each unit
for unit in $(systemctl list-units | grep subpath | grep "run-" | awk '{print $1}'); do
  # finding description file for unit (to find out docker container, who born this unit)
  DropFile=$(systemctl status ${unit} | grep Drop | awk -F': ' '{print $2}')
  # reading uuid for docker container from description file
  DockerContainerId=$(cat ${DropFile}/50-Description.conf | awk '{print $5}' | cut -d/ -f6)
  # checking container status (running or not)
  checkFlag=$(docker ps | grep -c ${DockerContainerId})
  # if container not running, we will stop unit
  if [[ ${checkFlag} -eq 0 ]]; then
    echo "Stopping unit ${unit}"
    # stoping unit in action
    systemctl stop $unit
    # just counter for logs
    ((stoppedCount++))
    # logging current progress
    echo "Stopped ${stoppedCount} systemd units out of ${countBefore}"
  fi
done

... kuma yana gudana kowane minti 5 ta amfani da supercronic da aka ambata a baya. Dockerfile ɗin sa yayi kama da haka:

FROM ubuntu:16.04
COPY rootfs /
WORKDIR /app
RUN apt-get update && 
    apt-get upgrade -y && 
    apt-get install -y gnupg curl apt-transport-https software-properties-common wget
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" && 
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - && 
    apt-get update && 
    apt-get install -y docker-ce=17.03.0*
RUN wget https://github.com/aptible/supercronic/releases/download/v0.1.6/supercronic-linux-amd64 -O 
    /usr/local/bin/supercronic && chmod +x /usr/local/bin/supercronic
ENTRYPOINT ["/bin/bash", "-c", "/usr/local/bin/supercronic -json /app/crontab"]

Labari na 4. Gasa lokacin tsara kwasfa

An lura cewa: idan muna da kwasfa da aka sanya a kan kumburi kuma an fitar da hotonsa na dogon lokaci, to, wani kwasfa wanda ya "buga" kumburi iri ɗaya zai kasance kawai. baya fara jan hoton sabon kwafsa. Maimakon haka, yana jira har sai an ja hoton kwaf ɗin da ya gabata. Sakamakon haka, kwas ɗin da aka riga aka tsara kuma wanda za a iya sauke hotonsa a cikin minti ɗaya kawai zai ƙare a matsayin containerCreating.

Abubuwan da suka faru za su yi kama da haka:

Normal  Pulling    8m    kubelet, ip-10-241-44-128.ap-northeast-1.compute.internal  pulling image "registry.example.com/infra/openvpn/openvpn:master"

Sai dai itace cewa hoto guda daga jinkirin yin rajista na iya toshe turawa kowane kumburi.

Abin takaici, babu hanyoyi da yawa daga cikin halin da ake ciki:

  1. Gwada amfani da Docker Registry kai tsaye a cikin gungu ko kai tsaye tare da gungu (misali, GitLab Registry, Nexus, da sauransu);
  2. Yi amfani da kayan aiki kamar kraken.

Labari na 5. Nodes suna rataye saboda rashin ƙwaƙwalwar ajiya

A lokacin aiki na daban-daban aikace-aikace, mun kuma ci karo da wani halin da ake ciki inda wani kumburi gaba daya daina zama m: SSH ba ya amsa, duk saka idanu daemons fada kashe, sa'an nan kuma babu wani abu (ko kusan kome ba) anomalous a cikin rajistan ayyukan.

Zan gaya muku a cikin hotuna ta amfani da misalin kumburi ɗaya inda MongoDB ke aiki.

Wannan shine yadda atop yayi kama to hadurra:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

Kuma kamar haka - после hadurra:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

A cikin saka idanu, akwai kuma tsalle mai kaifi, wanda kumburi ya daina kasancewa:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

Don haka, daga hotunan kariyar ya bayyana a sarari cewa:

  1. RAM akan injin yana kusa da ƙarshe;
  2. Akwai tsalle mai kaifi a cikin amfani da RAM, bayan haka samun damar shiga gabaɗayan injin ɗin ya lalace ba zato ba tsammani;
  3. Babban ɗawainiya ya zo kan Mongo, wanda ke tilasta tsarin DBMS don amfani da ƙarin ƙwaƙwalwar ajiya da karantawa daga faifai.

Ya bayyana cewa idan Linux ya ƙare daga ƙwaƙwalwar ajiyar kyauta (matsa lamba ta ƙwaƙwalwar ajiya) kuma babu musanyawa, to. to Lokacin da mai kashe OOM ya zo, aikin daidaitawa na iya tasowa tsakanin jefa shafuka a cikin cache ɗin shafi da rubuta su zuwa faifai. Ana yin wannan ta kswapd, wanda cikin ƙarfin hali yana 'yantar da yawancin shafukan ƙwaƙwalwar ajiya gwargwadon yiwuwar rarrabawa na gaba.

Abin takaici, tare da babban nauyin I/O haɗe tare da ƙaramin adadin ƙwaƙwalwar ajiya kyauta, kswapd ya zama ginshiƙin tsarin gaba ɗaya, saboda an daure su da shi duk kasafi (laikan shafi) na shafukan ƙwaƙwalwar ajiya a cikin tsarin. Wannan na iya ci gaba na dogon lokaci idan matakan ba sa son yin amfani da ƙwaƙwalwar ajiya kuma, amma an daidaita su a ƙarshen rami mai kisa na OOM.

Tambayar dabi'a ita ce: me yasa mai kashe OOM ya zo a makara? A cikin maimaitawar da yake yi yanzu, mai kashe OOM yayi wauta sosai: zai kashe tsarin ne kawai lokacin da ƙoƙarin rarraba shafin ƙwaƙwalwar ajiya ya kasa, watau. idan laifin shafin ya gaza. Wannan ba ya faruwa na dogon lokaci, saboda kswapd da ƙarfin hali yana sakin shafukan ƙwaƙwalwar ajiya, yana zubar da cache na shafin (dukkan faifan I/O a cikin tsarin, a zahiri) zuwa faifai. A cikin ƙarin cikakkun bayanai, tare da bayanin matakan da ake buƙata don kawar da irin waɗannan matsalolin a cikin kwaya, za ku iya karantawa a nan.

Wannan hali kamata ya inganta tare da Linux kernel 4.6+.

Labari na 6. Pods sun makale a cikin Jiha

A cikin wasu gungu, waɗanda a cikin su akwai ƙwanƙwasa da yawa da ke aiki, mun fara lura cewa yawancinsu suna “tsaye” na dogon lokaci a cikin jihar. Pending, kodayake kwantena Docker da kansu sun riga sun gudana akan nodes kuma ana iya aiki tare da hannu.

Haka kuma, cikin describe babu laifi:

  Type    Reason                  Age                From                     Message
  ----    ------                  ----               ----                     -------
  Normal  Scheduled               1m                 default-scheduler        Successfully assigned sphinx-0 to ss-dev-kub07
  Normal  SuccessfulAttachVolume  1m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
  Normal  SuccessfulMountVolume   1m                 kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "sphinx-config"
  Normal  SuccessfulMountVolume   1m                 kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "default-token-fzcsf"
  Normal  SuccessfulMountVolume   49s (x2 over 51s)  kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
  Normal  Pulled                  43s                kubelet, ss-dev-kub07    Container image "registry.example.com/infra/sphinx-exporter/sphinx-indexer:v1" already present on machine
  Normal  Created                 43s                kubelet, ss-dev-kub07    Created container
  Normal  Started                 43s                kubelet, ss-dev-kub07    Started container
  Normal  Pulled                  43s                kubelet, ss-dev-kub07    Container image "registry.example.com/infra/sphinx/sphinx:v1" already present on machine
  Normal  Created                 42s                kubelet, ss-dev-kub07    Created container
  Normal  Started                 42s                kubelet, ss-dev-kub07    Started container

Bayan wasu digging, mun yi zato cewa kubelet kawai ba shi da lokaci don aika duk bayanai game da yanayin kwas ɗin da gwajin rayuwa / shirye-shiryen zuwa uwar garken API.

Kuma bayan nazarin taimako, mun sami wadannan sigogi:

--kube-api-qps - QPS to use while talking with kubernetes apiserver (default 5)
--kube-api-burst  - Burst to use while talking with kubernetes apiserver (default 10) 
--event-qps - If > 0, limit event creations per second to this value. If 0, unlimited. (default 5)
--event-burst - Maximum size of a bursty event records, temporarily allows event records to burst to this number, while still not exceeding event-qps. Only used if --event-qps > 0 (default 10) 
--registry-qps - If > 0, limit registry pull QPS to this value.
--registry-burst - Maximum size of bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry-qps. Only used if --registry-qps > 0 (default 10)

Kamar yadda aka gani, Adadin dabi'u kadan ne, kuma a cikin 90% suna rufe duk bukatun ... Duk da haka, a cikin yanayinmu wannan bai isa ba. Saboda haka, mun saita dabi'u masu zuwa:

--event-qps=30 --event-burst=40 --kube-api-burst=40 --kube-api-qps=30 --registry-qps=30 --registry-burst=40

... kuma muka sake kunna kubelets, bayan haka mun ga hoto mai zuwa a cikin jadawali na kira zuwa uwar garken API:

6 kurakuran tsarin nishadi a cikin aikin Kubernetes [da maganin su]

... kuma a, komai ya fara tashi!

PS

Don taimakonsu wajen tattara kwari da shirya wannan labarin, Ina nuna godiya ta ga yawancin injiniyoyi na kamfaninmu, musamman ga abokin aikinmu daga ƙungiyar R&D Andrey Klimentyev (zuzzurfan).

PPS

Karanta kuma a kan shafinmu:

source: www.habr.com

Add a comment