Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Ho theosa le lilemo tsa ho sebelisa Kubernetes tlhahiso, re bokelletse lipale tse ngata tse khahlisang tsa hore na likokoana-hloko likarolong tse fapaneng tsa tsamaiso li lebisitse liphellong tse sa thabiseng le / kapa tse sa utloisiseheng tse amang ts'ebetso ea lijana le li-pods. Sehloohong sena re entse khetho ea tse ling tse tloaelehileng kapa tse thahasellisang. Le haeba ha ho mohla u kileng oa ba lehlohonolo la ho kopana le maemo a joalo, ho bala ka lipale tse khutšoane tsa mafokisi - haholo "matsoho a pele" - hoa khahla kamehla, na ha ho joalo?

Pale ea 1. Supercronic le Docker leketlileng

Ho e 'ngoe ea lihlopha, nako le nako re ne re amohela Docker e leqhoa, e neng e sitisa ts'ebetso e tloaelehileng ea sehlopha. Ka nako e ts'oanang, tse latelang li ile tsa bonoa ho li-logs tsa Docker:

level=error msg="containerd: start init process" error="exit status 2: "runtime/cgo: pthread_create failed: No space left on device
SIGABRT: abort
PC=0x7f31b811a428 m=0

goroutine 0 [idle]:

goroutine 1 [running]:
runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420026768 sp=0xc420026760
runtime.main() /usr/local/go/src/runtime/proc.go:127 +0x6c fp=0xc4200267c0 sp=0xc420026768
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4200267c8 sp=0xc4200267c0

goroutine 17 [syscall, locked to thread]:
runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:2086 +0x1

…

Se re khahlang haholo ka phoso ena ke molaetsa: pthread_create failed: No space left on device. Thuto e Potlakileng litokomane o hlalositse hore Docker ha e khone ho etsa ts'ebetso, ke ka lebaka leo e neng e hoama nako le nako.

Ha ho hlahlojoa, setšoantšo se latelang se lumellana le se etsahalang:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Boemo bo tšoanang bo bonoa libakeng tse ling:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Ka li-node tse tšoanang re bona:

root@kube-node-1 ~ # ps auxfww | grep curl -c
19782
root@kube-node-1 ~ # ps auxfww | grep curl | head
root     16688  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     17398  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     16852  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      9473  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      4664  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     30571  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     24113  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root     16475  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      7176  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>
root      1090  0.0  0.0      0     0 ?        Z    Feb06   0:00      |       _ [curl] <defunct>

Ho ile ha fumaneha hore boitšoaro bona ke phello ea pod e sebetsang le eona supercronic (Sesebelisoa sa Go seo re se sebelisang ho tsamaisa mesebetsi ea cron ka li-pods):

 _ docker-containerd-shim 833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 /var/run/docker/libcontainerd/833b60bb9ff4c669bb413b898a5fd142a57a21695e5dc42684235df907825567 docker-runc
|   _ /usr/local/bin/supercronic -json /crontabs/cron
|       _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true
|       |   _ /usr/bin/newrelic-daemon --agent --pidfile /var/run/newrelic-daemon.pid --logfile /dev/stderr --port /run/newrelic.sock --tls --define utilization.detect_aws=true --define utilization.detect_azure=true --define utilization.detect_gcp=true --define utilization.detect_pcf=true --define utilization.detect_docker=true -no-pidfile
|       _ [newrelic-daemon] <defunct>
|       _ [curl] <defunct>
|       _ [curl] <defunct>
|       _ [curl] <defunct>
…

Bothata ke bona: ha mosebetsi o etsoa ka supercronic, ts'ebetso e hlahisoa ke eona ha e khone ho emisa ka nepo, ho fetoha zombie.

mantsoe: E le ho nepahala haholoanyane, lits'ebetso li hlahisoa ke mesebetsi ea cron, empa supercronic hase tsamaiso ea init 'me e ke ke ea "amohela" mekhoa eo bana ba eona ba e hlahisitseng. Ha matšoao a SIGHUP kapa SIGTERM a phahamisoa, ha a fetisetsoe lits'ebetsong tsa ngoana, e leng se etsang hore lits'ebetso tsa ngoana li se ke tsa emisa le ho lula maemong a zombie. U ka bala ho eketsehileng ka sena sohle, mohlala, ho sengoloa se joalo.

Ho na le mekhoa e 'meli ea ho rarolla bothata:

  1. Joalo ka ts'ebetso ea nakoana - eketsa palo ea li-PID ka har'a sistimi ka nako e le 'ngoe:
           /proc/sys/kernel/pid_max (since Linux 2.5.34)
                  This file specifies the value at which PIDs wrap around (i.e., the value in this file is one greater than the maximum PID).  PIDs greater than this  value  are  not  allo‐
                  cated;  thus, the value in this file also acts as a system-wide limit on the total number of processes and threads.  The default value for this file, 32768, results in the
                  same range of PIDs as on earlier kernels
  2. Kapa qala mesebetsi ka supercronic eseng ka kotloloho, empa u sebelisa e tšoanang e nyane, e khonang ho felisa lits'ebetso ka nepo mme e sa hlahise Zombies.

Pale ea 2. "Zombies" ha u tlosa sehlopha

Kubelet o ile a qala ho ja CPU e ngata:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Ha ho motho ea tla rata sena, kahoo re ile ra hlomela sehlahisoa mme a qala ho sebetsana le bothata. Liphetho tsa lipatlisiso e bile tse latelang:

  • Kubelet e sebelisa nako e fetang karolo ea boraro ea nako ea eona ea CPU e hula lintlha tsa memori ho tsoa lihlopheng tsohle:

    Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

  • Lethathamong la mangolo la bahlahisi ba kernel u ka fumana puisano ea bothata. Ka bokhutšoanyane, ntlha e theohela ho sena: Lifaele tse fapaneng tsa tmpfs le lintho tse ling tse tšoanang ha li tlosoe ka botlalo ho sistimi ha o hlakola sehlopha, se bitsoang memcg zombie len. Haufinyane li tla hlakoloa ho cache ea leqephe, empa ho na le mohopolo o mongata ho seva mme kernel ha e bone ntlha ea ho senya nako ho e hlakola. Ke ka hona ba lulang ba bokellana. Ke hobane'ng ha see se ntse se etsahala? Ena ke seva e nang le mesebetsi ea cron e lulang e theha mesebetsi e mecha, 'me e na le li-pods tse ncha. Kahoo, lihlopha tse ncha li bōptjoa bakeng sa lijana tse ho tsona, tse tla tlosoa kapele.
  • Hobaneng ha cAdvisor ho kubelet e senya nako e ngata hakaale? Sena se bonolo ho se bona ka ts'ebetso e bonolo ka ho fetisisa time cat /sys/fs/cgroup/memory/memory.stat. Haeba mochining o phetseng hantle ts'ebetso e nka metsotsoana e 0,01, joale ho cron02 e nang le bothata e nka metsotsoana e 1,2. Taba ke hore cAdvisor, e balang data ho sysfs butle haholo, e leka ho ela hloko mohopolo o sebelisitsoeng lihlopheng tsa zombie.
  • Ho tlosa Zombies ka matla, re lekile ho hlakola li-cache joalo ka ha ho khothalelitsoe ho LKML: sync; echo 3 > /proc/sys/vm/drop_caches, - empa kernel e ile ea fetoha e rarahaneng le ho feta 'me ea senya koloi.

Se o lokelang ho se etsa? Bothata bo ntse bo lokisoa (itlama, le bakeng sa tlhaloso bona ho lokolla molaetsa) ho ntlafatsa kernel ea Linux ho mofuta oa 4.16.

Histori 3. Systemd le thaba ea eona

Hape, kubelet e sebelisa lisebelisoa tse ngata haholo libakeng tse ling, empa lekhetlong lena e nka mohopolo o mongata haholo:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Ho ile ha fumaneha hore ho na le bothata ho systemd e sebelisitsoeng ho Ubuntu 16.04, mme e etsahala ha o laola li-mounts tse etselitsoeng khokahano. subPath ho tsoa ho ConfigMap kapa lekunutu. Ka mor'a hore setopo se qete mosebetsi oa sona tšebeletso ea systemd le sebaka sa eona sa tšebeletso se sala tsamaisong. Ha nako e ntse e ea, palo e kholo ea bona e bokellana. Ho boetse ho na le mathata tabeng ena:

  1. #5916;
  2. kubernetes #57345.

... ea ho qetela e bua ka PR ho systemd: #7811 (tabeng ea systemd - #7798).

Bothata ha bo sa le teng ho Ubuntu 18.04, empa haeba u batla ho tsoela pele ho sebelisa Ubuntu 16.04, u ka fumana mosebetsi oa rona o le molemo tabeng ena.

Kahoo re entse DaemonSet e latelang:

---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: systemd-slices-cleaner
  name: systemd-slices-cleaner
  namespace: kube-system
spec:
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: systemd-slices-cleaner
  template:
    metadata:
      labels:
        app: systemd-slices-cleaner
    spec:
      containers:
      - command:
        - /usr/local/bin/supercronic
        - -json
        - /app/crontab
        Image: private-registry.org/systemd-slices-cleaner/systemd-slices-cleaner:v0.1.0
        imagePullPolicy: Always
        name: systemd-slices-cleaner
        resources: {}
        securityContext:
          privileged: true
        volumeMounts:
        - name: systemd
          mountPath: /run/systemd/private
        - name: docker
          mountPath: /run/docker.sock
        - name: systemd-etc
          mountPath: /etc/systemd
        - name: systemd-run
          mountPath: /run/systemd/system/
        - name: lsb-release
          mountPath: /etc/lsb-release-host
      imagePullSecrets:
      - name: antiopa-registry
      priorityClassName: cluster-low
      tolerations:
      - operator: Exists
      volumes:
      - name: systemd
        hostPath:
          path: /run/systemd/private
      - name: docker
        hostPath:
          path: /run/docker.sock
      - name: systemd-etc
        hostPath:
          path: /etc/systemd
      - name: systemd-run
        hostPath:
          path: /run/systemd/system/
      - name: lsb-release
        hostPath:
          path: /etc/lsb-release

... 'me e sebelisa script e latelang:

#!/bin/bash

# we will work only on xenial
hostrelease="/etc/lsb-release-host"
test -f ${hostrelease} && grep xenial ${hostrelease} > /dev/null || exit 0

# sleeping max 30 minutes to dispense load on kube-nodes
sleep $((RANDOM % 1800))

stoppedCount=0
# counting actual subpath units in systemd
countBefore=$(systemctl list-units | grep subpath | grep "run-" | wc -l)
# let's go check each unit
for unit in $(systemctl list-units | grep subpath | grep "run-" | awk '{print $1}'); do
  # finding description file for unit (to find out docker container, who born this unit)
  DropFile=$(systemctl status ${unit} | grep Drop | awk -F': ' '{print $2}')
  # reading uuid for docker container from description file
  DockerContainerId=$(cat ${DropFile}/50-Description.conf | awk '{print $5}' | cut -d/ -f6)
  # checking container status (running or not)
  checkFlag=$(docker ps | grep -c ${DockerContainerId})
  # if container not running, we will stop unit
  if [[ ${checkFlag} -eq 0 ]]; then
    echo "Stopping unit ${unit}"
    # stoping unit in action
    systemctl stop $unit
    # just counter for logs
    ((stoppedCount++))
    # logging current progress
    echo "Stopped ${stoppedCount} systemd units out of ${countBefore}"
  fi
done

... 'me e matha metsotso e meng le e meng e 5 e sebelisa supercronic e boletsoeng pejana. Dockerfile ea eona e shebahala tjena:

FROM ubuntu:16.04
COPY rootfs /
WORKDIR /app
RUN apt-get update && 
    apt-get upgrade -y && 
    apt-get install -y gnupg curl apt-transport-https software-properties-common wget
RUN add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" && 
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - && 
    apt-get update && 
    apt-get install -y docker-ce=17.03.0*
RUN wget https://github.com/aptible/supercronic/releases/download/v0.1.6/supercronic-linux-amd64 -O 
    /usr/local/bin/supercronic && chmod +x /usr/local/bin/supercronic
ENTRYPOINT ["/bin/bash", "-c", "/usr/local/bin/supercronic -json /app/crontab"]

Pale ea 4. Tlholisano ha ho hlophisoa li-pods

Ho ile ha hlokomeloa hore: haeba re e-na le pod e behiloeng holim'a node 'me setšoantšo sa eona se phunyeletsoa ka nako e telele haholo, joale pod e' ngoe e "otlang" node e tšoanang e tla feela. ha e qale ho hula setšoantšo sa pod e ncha. Ho e-na le hoo, e emela ho fihlela setšoantšo sa pod e fetileng se huloa. Ka lebaka leo, pod e neng e se e reriloe 'me setšoantšo sa eona se ka beng sa kopitsoa ka motsotso feela se tla qetella se le boemong ba containerCreating.

Liketsahalo li tla shebahala tjena:

Normal  Pulling    8m    kubelet, ip-10-241-44-128.ap-northeast-1.compute.internal  pulling image "registry.example.com/infra/openvpn/openvpn:master"

E hlaha joalo setšoantšo se le seng se tsoang ho registry butle se ka thibela ho tsamaisoa ka node.

Ka bomalimabe, ha ho na mekhoa e mengata ea ho tsoa boemong bona:

  1. Leka ho sebelisa Registry ea hau ea Docker ka kotloloho sehlopheng kapa ka kotloloho le sehlopha (mohlala, GitLab Registry, Nexus, joalo-joalo);
  2. Sebelisa lisebelisoa tse joalo kraken.

Pale ea 5. Li-node li leketla ka lebaka la ho hloka mohopolo

Nakong ea ts'ebetso ea likopo tse fapaneng, re ile ra boela ra kopana le boemo boo node e khaotsang ho fumaneha ka ho feletseng: SSH ha e arabe, li-daemone tsohle tsa ho shebella lia oa, ebe ha ho letho (kapa hoo e batlang e le letho) le makatsang ka har'a li-log.

Ke tla u joetsa litšoantšong ke sebelisa mohlala oa node e le 'ngoe moo MongoDB e neng e sebetsa teng.

Sena ke seo atop e shebahalang ka sona ho likotsi:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

'Me joalo-joalo после likotsi:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Ha ho hlahlojoa, ho boetse ho na le ho tlola ho bohale, moo node e khaotsang ho ba teng:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

Kahoo, ho tsoa ho li-screenshots ho hlakile hore:

  1. RAM mochine o haufi le ho qetela;
  2. Ho na le ho tlola ho hoholo ha tšebeliso ea RAM, ka mor'a moo phihlello ea mochini oohle e holofala ka tšohanyetso;
  3. Mosebetsi o moholo o fihla Mongo, o qobellang ts'ebetso ea DBMS ho sebelisa mohopolo o mongata le ho bala ka mafolofolo ho tsoa ho disk.

Hoa etsahala hore haeba Linux e felloa ke mohopolo oa mahala (khatello ea memori e kena) mme ha ho na swap, joale ho Ha 'molai oa OOM a fihla, ho ka hlaha ketso ea ho leka-lekanya pakeng tsa ho lahlela maqephe ka har'a cache ea leqephe le ho a ngola hape ho disk. Sena se etsoa ke kswapd, eo ka sebete e lokollang maqephe a memori a mangata kamoo ho ka khonehang bakeng sa kabo e latelang.

Ka bomalimabe, ka mojaro o moholo oa I / O o kopantsoeng le mohopolo o fokolang oa mahala, kswapd e fetoha tšitiso ea sistimi eohle, hobane ba tlameletsoe ho eona tsohle kabo (mathata a maqephe) a maqephe a memori a sistimi. Sena se ka tsoela pele ka nako e telele haholo haeba lits'ebetso li se li sa batle ho sebelisa mohopolo, empa li tsitsitse pheletsong ea mohohlo oa OOM-killer.

Potso ea tlhaho ke hore: hobaneng 'molai oa OOM a tla morao hakana? Phetolelong ea eona ea hajoale, 'molai oa OOM ke sethoto haholo: e tla bolaea ts'ebetso feela ha teko ea ho fana ka leqephe la memori e hloleha, ke hore. haeba phoso ea leqephe e hloleha. Sena ha se etsahale ka nako e telele, hobane kswapd ka sebete e lokolla maqephe a memori, e lahla cache ea leqephe (disk eohle I/O tsamaisong, ha e le hantle) e khutlela ho disk. Ka ho qaqileng haholoanyane, ka tlhaloso ea mehato e hlokahalang ho felisa mathata a joalo ka kernel, u ka bala mona.

Boitšoaro bona lokela ho ntlafatsa ka Linux kernel 4.6+.

Pale ea 6. Li-pods li lula li le teng

Lihlopheng tse ling, tseo ho tsona ho nang le li-pods tse ngata tse sebetsang, re ile ra qala ho hlokomela hore boholo ba bona ba "fanyeha" nako e telele haholo seterekeng. Pending, leha lijana tsa Docker ka botsona li se li ntse li sebetsa ho li-node mme li ka sebetsoa ka letsoho.

Ho feta moo, ka describe ha ho letho le phoso:

  Type    Reason                  Age                From                     Message
  ----    ------                  ----               ----                     -------
  Normal  Scheduled               1m                 default-scheduler        Successfully assigned sphinx-0 to ss-dev-kub07
  Normal  SuccessfulAttachVolume  1m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
  Normal  SuccessfulMountVolume   1m                 kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "sphinx-config"
  Normal  SuccessfulMountVolume   1m                 kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "default-token-fzcsf"
  Normal  SuccessfulMountVolume   49s (x2 over 51s)  kubelet, ss-dev-kub07    MountVolume.SetUp succeeded for volume "pvc-6aaad34f-ad10-11e8-a44c-52540035a73b"
  Normal  Pulled                  43s                kubelet, ss-dev-kub07    Container image "registry.example.com/infra/sphinx-exporter/sphinx-indexer:v1" already present on machine
  Normal  Created                 43s                kubelet, ss-dev-kub07    Created container
  Normal  Started                 43s                kubelet, ss-dev-kub07    Started container
  Normal  Pulled                  43s                kubelet, ss-dev-kub07    Container image "registry.example.com/infra/sphinx/sphinx:v1" already present on machine
  Normal  Created                 42s                kubelet, ss-dev-kub07    Created container
  Normal  Started                 42s                kubelet, ss-dev-kub07    Started container

Kamora ho cheka, re ile ra nahana hore kubelet ha e na nako ea ho romella tlhaiso-leseling eohle mabapi le boemo ba li-pods le liteko tsa ho phela / ho itokisa ho seva sa API.

'Me ka mor'a ho ithuta thuso, re fumane liparamente tse latelang:

--kube-api-qps - QPS to use while talking with kubernetes apiserver (default 5)
--kube-api-burst  - Burst to use while talking with kubernetes apiserver (default 10) 
--event-qps - If > 0, limit event creations per second to this value. If 0, unlimited. (default 5)
--event-burst - Maximum size of a bursty event records, temporarily allows event records to burst to this number, while still not exceeding event-qps. Only used if --event-qps > 0 (default 10) 
--registry-qps - If > 0, limit registry pull QPS to this value.
--registry-burst - Maximum size of bursty pulls, temporarily allows pulls to burst to this number, while still not exceeding registry-qps. Only used if --registry-qps > 0 (default 10)

Joalokaha re bona, litekanyetso tsa kamehla li nyane haholo, 'me ka 90% ba koahela litlhoko tsohle ... Leha ho le joalo, tabeng ea rona sena se ne se sa lekana. Ka hona, re beha litekanyetso tse latelang:

--event-qps=30 --event-burst=40 --kube-api-burst=40 --kube-api-qps=30 --registry-qps=30 --registry-burst=40

... mme re qalella li-kubelets, ka mor'a moo re bone setšoantšo se latelang ho li-graph tsa mehala ho seva sa API:

Litšitšili tse 6 tsa tsamaiso ea boithabiso ts'ebetsong ea Kubernetes [le tharollo ea bona]

... 'me e, ntho e' ngoe le e 'ngoe e ile ea qala ho fofa!

PES

Bakeng sa thuso ea bona ea ho bokella likokoana-hloko le ho lokisetsa sengoloa sena, ke leboha haholo baenjiniere ba bangata ba k'hamphani ea rona, haholo-holo ho mosebetsi-'moho le 'na ho tsoa sehlopheng sa rona sa R&D Andrey Klimentyev (zuzzas).

PPS

Bala hape ho blog ea rona:

Source: www.habr.com

Eketsa ka tlhaloso