Mālama i nā kumukūʻai ao Kubernetes ma AWS

Ua hoʻomākaukau ʻia ka unuhi ʻana o ka ʻatikala ma ka pō o ka hoʻomaka ʻana o ka papa "Pūnaewele infrastructure e pili ana i nā Kubernetes".

Mālama i nā kumukūʻai ao Kubernetes ma AWS

Pehea e mālama ai i nā kumukūʻai ao ke hana pū me Kubernetes? ʻAʻohe hopena kūpono hoʻokahi, akā wehewehe kēia ʻatikala i nā mea hana e hiki ke kōkua iā ʻoe e hoʻokele pono i kāu mau kumuwaiwai a hoʻemi i kāu kumukūʻai cloud computing.

Ua kākau wau i kēia ʻatikala me Kubernetes no AWS ma ka noʻonoʻo, akā pili ia (kokoke) i ke ala like i nā mea hoʻolako kapua. Ke manaʻo nei au ua hoʻonohonoho ʻia ka autoscaling i kāu cluster (s)cluster-autoscaler). ʻO ka wehe ʻana i nā kumuwaiwai a me ka hoʻohaʻahaʻa ʻana i kāu hoʻolālā e mālama wale ʻoe i ke kālā inā e hōʻemi pū ʻia kāu mau ʻauwaʻa o nā nodes limahana (EC2 manawa).

E pili ana kēia ʻatikala:

  • hoʻomaʻemaʻe i nā kumuwaiwai i hoʻohana ʻole ʻia (kube-janitor)
  • E ho'ēmi i ka scaling i nā hola hana ʻole (kube-downscaler)
  • me ka hoʻohana ʻana i ka hoʻopalekana autoscaling (HPA),
  • hōʻemi i ka hoʻopaʻa waiwai nui (kube-resource-report, VPA)
  • me ka hoʻohana ʻana i nā hiʻohiʻona Spot

Hoʻomaʻemaʻe i nā kumuwaiwai i hoʻohana ʻole ʻia

He mea maikaʻi loa ka hana ʻana ma kahi ʻano wikiwiki. Makemake mākou i nā hui ʻenehana hoʻokē ʻia. ʻO ka hāʻawi ʻana i nā polokalamu ʻoi aku ka wikiwiki o ka hoʻolaha ʻana i ka PR, ka nānā ʻana i nā kaiapuni, nā prototypes, a me nā hoʻonā analytics. Hoʻopili ʻia nā mea a pau ma Kubernetes. ʻO wai ka manawa e hoʻomaʻemaʻe lima i nā hoʻolālā hoʻāʻo? He mea maʻalahi ke poina e pili ana i ka holoi ʻana i kahi hoʻokolohua hebedoma. E piʻi aʻe ka pila ao ma muli o kekahi mea i poina iā mākou e pani:

Mālama i nā kumukūʻai ao Kubernetes ma AWS

(Henning Jacobs:
Zhiza:
(nā ʻōlelo) Corey Quinn:
Myth: ʻO kāu moʻokāki AWS kahi hana o ka helu o nā mea hoʻohana iā ʻoe.
ʻOiaʻiʻo: ʻO kāu helu AWS kahi hana o ka helu o nā ʻenekinia āu i loaʻa ai.

ʻO Ivan Kurnosov (ma ka pane):
ʻOiaʻiʻo: ʻO kāu helu AWS kahi hana o ka helu o nā mea āu i poina ai e hoʻopau / holoi.)

Kubernetes Janitor (kube-janitor) kōkua i ka hoʻomaʻemaʻe ʻana i kāu pūʻulu. Hiki ke maʻalahi ka hoʻonohonoho hoʻonohonoho ʻana no ka honua a me ka hoʻohana kūloko:

  • Hiki i nā lula ākea ākea ke wehewehe i ka lōʻihi o ka manawa e ola ai (TTL) no nā hoʻolālā PR/hoʻāʻo.
  • Hiki ke hoʻokaʻawale ʻia nā kumuwaiwai pākahi me ka janitor/ttl, no ka laʻana no ka wehe ʻana i ka spike/prototype ma hope o 7 mau lā.

Ua wehewehe ʻia nā lula maʻamau i ka faila YAML. Ua hala kona ala ma ka palena --rules-file i kube-janitor. Eia kekahi laʻana lula e wehe i nā namespaces me -pr- ma ka inoa ma hope o nā lā ʻelua:

- id: cleanup-resources-from-pull-requests
  resources:
    - namespaces
  jmespath: "contains(metadata.name, '-pr-')"
  ttl: 2d

Hoʻoponopono kēia hiʻohiʻona i ka hoʻohana ʻana i ka lepili noi ma nā pods Deployment a StatefulSet no nā Deployments/StatefulSets hou i 2020, akā i ka manawa like e ʻae i ka hoʻokō ʻana i nā hoʻokolohua me ka ʻole o kēia lepili no hoʻokahi pule:

- id: require-application-label
  # удалить deployments и statefulsets без метки "application"
  resources:
    - deployments
    - statefulsets
  # см. http://jmespath.org/specification.html
  jmespath: "!(spec.template.metadata.labels.application) && metadata.creationTimestamp > '2020-01-01'"
  ttl: 7d

E holo i kahi demo palena manawa no 30 mau minuke ma kahi hui e holo ana i ka kube-janitor:

kubectl run nginx-demo --image=nginx
kubectl annotate deploy nginx-demo janitor/ttl=30m

ʻO kekahi kumu o ka hoʻonui ʻana i nā kumukūʻai ʻo nā volumes mau loa (AWS EBS). ʻO ka holoi ʻana i kahi Kubernetes StatefulSet ʻaʻole ia e holoi i kāna mau puke hoʻomau (PVC - PersistentVolumeClaim). Hiki i nā puke EBS i hoʻohana ʻole ʻia ke hopena maʻalahi i nā kumukūʻai o nā haneli kālā i kēlā me kēia mahina. He hiʻohiʻona ko Kubernetes Janitor e hoʻomaʻemaʻe i nā PVC i hoʻohana ʻole ʻia. No ka laʻana, e wehe kēia lula i nā PVC āpau ʻaʻole i kau ʻia e kahi module a ʻaʻole i kuhikuhi ʻia e kahi StatefulSet a i ʻole CronJob:

# удалить все PVC, которые не смонтированы и на которые не ссылаются StatefulSets
- id: remove-unused-pvcs
  resources:
  - persistentvolumeclaims
  jmespath: "_context.pvc_is_not_mounted && _context.pvc_is_not_referenced"
  ttl: 24h

Hiki iā Kubernetes Janitor ke kōkua iā ʻoe e hoʻomaʻemaʻe i kāu puʻupuʻu a pale i nā kumukūʻai hoʻopili kapuaʻi mai ka hōʻiliʻili mālie. No ka hoʻolālā a me ka hoʻonohonoho ʻana i nā ʻōlelo kuhikuhi, e hahai README kube-janitor.

E ho'ēmi i ka scaling i nā hola hana ʻole

Pono nā ʻōnaehana hoʻāʻo a me ka hoʻokūkū e hana wale i nā hola ʻoihana. ʻO kekahi mau noi hana, e like me nā mea hana back office/admin, pono wale nō ka loaʻa ʻana a hiki ke hoʻopau ʻia i ka pō.

Kubernetes Downscaler (kube-downscaler) hiki i nā mea hoʻohana a me nā mea hoʻohana ke hoʻohaʻahaʻa i ka ʻōnaehana i nā hola hana ʻole. Hiki i nā Deployments a me StatefulSets ke hoʻonui i nā kope kope. Hiki ke hoʻokuʻu ʻia ʻo CronJobs. Hoʻonohonoho ʻia ʻo Kubernetes Downscaler no kahi pūʻulu holoʻokoʻa, hoʻokahi a ʻoi aku paha nā inoa inoa, a i ʻole nā ​​kumuwaiwai pākahi. Hiki iā ʻoe ke hoʻonohonoho i ka "manawa hana ʻole" a i ʻole, "manawa hana". No ka laʻana, e hōʻemi i ka scaling e like me ka hiki i nā pō a me nā hopena pule:

image: hjacobs/kube-downscaler:20.4.3
args:
  - --interval=30
  # не отключать компоненты инфраструктуры
  - --exclude-namespaces=kube-system,infra
  # не отключать kube-downscaler, а также оставить Postgres Operator, чтобы исключенными БД можно было управлять
  - --exclude-deployments=kube-downscaler,postgres-operator
  - --default-uptime=Mon-Fri 08:00-20:00 Europe/Berlin
  - --include-resources=deployments,statefulsets,stacks,cronjobs
  - --deployment-time-annotation=deployment-time

Eia ka pakuhi no ka hoʻonui ʻana i nā node limahana hui ma nā hopena pule:

Mālama i nā kumukūʻai ao Kubernetes ma AWS

ʻO ka hoʻohaʻahaʻa ʻana mai ~ 13 a 4 mau nodes limahana e hana maoli i kahi ʻokoʻa ʻike i kāu bila AWS.

He aha inā inā pono wau e hana i ka wā o ka cluster "downtime"? Hiki ke hoʻokuʻu ʻia kekahi mau hoʻolālā mai ka hoʻonui ʻana ma ka hoʻohui ʻana i ka downscaler/exclude: hōʻike ʻoiaʻiʻo. Hiki ke hoʻokuʻu ʻia nā hoʻolālā no ka manawa pōkole me ka hoʻohana ʻana i ka hōʻike hoʻohaʻahaʻa/exclude-a hiki i ka hōʻike me kahi hōʻailona manawa ma ke ʻano YYYY-MM-DD HH:MM (UTC). Inā pono, hiki ke hoʻemi ʻia ka puʻupuʻu holoʻokoʻa ma o ka lawe ʻana i kahi pod me ka hōʻike downscaler/force-uptime, no ka laʻana, ma ka wehe ʻana i ka nginx blank:

kubectl run scale-up --image=nginx
kubectl annotate deploy scale-up janitor/ttl=1h # удалить развертывание через час
kubectl annotate pod $(kubectl get pod -l run=scale-up -o jsonpath="{.items[0].metadata.name}") downscaler/force-uptime=true

E nānā README kube-downscaler, inā makemake ʻoe i nā ʻōlelo kuhikuhi a me nā koho hou aʻe.

E hoʻohana i ka hoʻonui ʻana i ka pae ʻaunoa

Nui nā noi/lawelawe e pili ana i kahi ʻano hoʻouka ʻana: i kekahi manawa ua palaualelo kā lākou mau modula, a i kekahi manawa hana lākou me ka piha piha. ʻO ka hoʻohana ʻana i kahi ʻauwaʻa mau o nā pods e hoʻokō i ka haʻahaʻa kiʻekiʻe loa ʻaʻole kūpono. Kākoʻo ʻo Kubernetes i ka hoʻonui ʻokoʻa ākea ma waena o kahi kumuwaiwai HorizontalPodAutoscaler (HPA). ʻO ka hoʻohana pinepine ʻana i ka CPU he hōʻailona maikaʻi no ka scaling:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 100
        type: Utilization

Ua hana ʻo Zalando i kahi mea e hoʻopili maʻalahi i nā metric maʻamau no ka scaling: Kube Metrics Adapter (kube-metrics-adapter) he mea hoʻopili metric maʻamau no nā Kubernetes e hiki ke hōʻiliʻili a lawelawe i nā ana maʻamau a me waho no ka hoʻokaʻawale autoscaling o nā pods. Kākoʻo ia i ka scaling e pili ana i nā metric Prometheus, SQS queues, a me nā hoʻonohonoho ʻē aʻe. No ka laʻana, e hoʻonui i kāu kau ʻana i kahi metric maʻamau i hōʻike ʻia e ka noi ponoʻī e like me JSON ma /metric hoʻohana:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    metric-config.pods.requests-per-second.json-path/json-key: "$.http_server.rps"
    metric-config.pods.requests-per-second.json-path/path: /metrics
    metric-config.pods.requests-per-second.json-path/port: "9090"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        averageValue: 1k
        type: AverageValue

ʻO ka hoʻonohonoho ʻana i ka autoscaling ākea me HPA ʻo ia kekahi o nā hana paʻamau e hoʻomaikaʻi i ka pono no nā lawelawe mokuʻāina. Loaʻa iā Spotify kahi hōʻike me kā lākou ʻike a me nā ʻōlelo aʻoaʻo no HPA: e hoʻonui i kāu kau ʻana, ʻaʻole kāu ʻeke.

E ho'ēmi i ka hoʻopaʻa puke waiwai

Hoʻoholo nā hana hana Kubernetes i kā lākou pono CPU/memo ma o "nā noi waiwai." Ua ana ʻia nā kumuwaiwai CPU ma nā cores virtual a i ʻole ka mea maʻamau i "millicores", no ka laʻana, ʻo 500m ka 50% vCPU. Ana ʻia nā kumuwaiwai hoʻomanaʻo ma nā byte, a hiki ke hoʻohana ʻia nā suffix maʻamau, e like me 500Mi, ʻo ia hoʻi he 500 megabytes. Noi ka punawai i ka mana "laka" ma nā node limahana, ʻo ia hoʻi, ʻo kahi pod me kahi noi CPU 1000m ma kahi node me 4 vCPU e waiho wale ana i 3 vCPU i loaʻa i nā pods ʻē aʻe. [1]

Slack ʻo ia ka ʻokoʻa ma waena o nā kumuwaiwai i noi ʻia a me ka hoʻohana maoli ʻana. No ka laʻana, kahi pod e noi ana i 2 GiB o ka hoʻomanaʻo akā hoʻohana wale i ka 200 MiB loaʻa ~1,8 GiB o ka hoʻomanaʻo "ʻoi aku". ʻO ke kālā keu. Hiki i kekahi ke hoʻohālikelike i ka 1 GiB o nā kumukūʻai hoʻomanaʻo redundant ~ $ 10 i kēlā me kēia mahina. [2]

Kubernetes Resource Report (kube-resource-report) hōʻike i nā mea mālama kālā a hiki ke kōkua iā ʻoe e hoʻoholo i ka hiki ke mālama:

Mālama i nā kumukūʻai ao Kubernetes ma AWS

Kubernetes Resource Report hōʻike i ka hoʻonui i hoʻohui ʻia e ka noi a me ke kauoha. Hiki iā ʻoe ke ʻimi i nā wahi i hiki ke hoʻemi ʻia nā koi waiwai. Hāʻawi ka hōʻike HTML i hana ʻia i kahi kiʻi o ka hoʻohana waiwai. Pono ʻoe e nānā i ka hoʻohana ʻana i ka CPU/memo i ka manawa e hoʻoholo ai i nā noi waiwai kūpono. Eia kahi pakuhi Grafana no kahi lawelawe "maʻamau" CPU-kaumaha: ke hoʻohana nui nei nā pods a pau ma mua o nā cores CPU 3 i noi ʻia:

Mālama i nā kumukūʻai ao Kubernetes ma AWS

ʻO ka hōʻemi ʻana i ka noi CPU mai 3000m a ~400m e hoʻokuʻu i nā kumuwaiwai no nā haʻahaʻa hana ʻē aʻe a hiki i ka puʻupuʻu ke liʻiliʻi.

"Hoʻohana pinepine ʻia ka hoʻohana ʻana o ka CPU i nā manawa EC2 i ka pākēneka helu helu hoʻokahi," kākau ʻo Corey Quinn. ʻOiai no EC2 ʻo ke koho ʻana i ka nui kūpono he hoʻoholo maikaʻi ʻole pahaHe maʻalahi ka hoʻololi ʻana i kekahi mau nīnau kumu kumu Kubernetes ma kahi faila YAML a hiki ke lawe i ka mālama kālā nui.

Akā makemake maoli mākou i ka poʻe e hoʻololi i nā waiwai i nā faila YAML? ʻAʻole, hiki i nā mīkini ke hana i ʻoi aku ka maikaʻi! Kubernetes ʻO ka mea hoʻolale ʻaunoa Pod Vertical (VPA) pēlā: hoʻololi i nā noi waiwai a me nā kaohi e like me ke kaumaha o ka hana. Eia kekahi laʻana pakuhi o nā noi Prometheus CPU (laina polū lahilahi) i hoʻololi ʻia e VPA i ka manawa:

Mālama i nā kumukūʻai ao Kubernetes ma AWS

Hoʻohana ʻo Zalando i ka VPA ma kāna mau pūʻulu āpau no nā ʻāpana hana. Hiki i nā noi koʻikoʻi ʻole ke hoʻohana i ka VPA.

ʻO Goldilocks mai Fairwind he mea hana e hana i ka VPA no kēlā me kēia hoʻokomo ʻana i kahi inoa inoa a laila hōʻike i kahi manaʻo VPA ma kāna dashboard. Hiki iā ia ke kōkua i nā mea hoʻomohala e hoʻonohonoho i nā noi CPU/memo pono no kā lākou mau noi:

Mālama i nā kumukūʻai ao Kubernetes ma AWS

Ua kākau wau i kahi liʻiliʻi blogpost e pili ana i ka VPA i 2019, a i kēia manawa i Ua kūkākūkā ʻo CNCF End User Community i ka pilikia VPA.

Ke hoʻohana nei i nā Instance Spot EC2

ʻO ka mea hope akā ʻaʻole ka mea liʻiliʻi loa, hiki ke hōʻemi ʻia nā kumukūʻai AWS EC2 ma o ka hoʻohana ʻana i nā manawa Spot e like me nā nodes limahana Kubernetes [3]. Loaʻa nā hiʻohiʻona kiko a hiki i ka 90% ka uku hoʻohālikelike i nā kumukūʻai On-Demand. ʻO ka holo ʻana i nā Kubernetes ma EC2 Spot kahi hui maikaʻi: pono ʻoe e kuhikuhi i nā ʻano hiʻohiʻona like ʻole no ka loaʻa kiʻekiʻe, ʻo ia hoʻi hiki iā ʻoe ke loaʻa i kahi node nui aʻe no ke kumu like a haʻahaʻa paha, a hiki ke hoʻohana ʻia ka mana hoʻonui e nā pahu hana Kubernetes.

Pehea e holo ai i nā Kubernetes ma EC2 Spot? Nui nā koho: e hoʻohana i kahi lawelawe ʻaoʻao ʻekolu e like me SpotInst (i kapa ʻia ʻo "Spot", mai nīnau mai iaʻu no ke aha), a i ʻole e hoʻohui i kahi Spot AutoScalingGroup (ASG) i kāu hui. Eia kekahi laʻana, eia kahi snippet CloudFormation no kahi "capacity-optimized" Spot ASG me nā ʻano hiʻohiʻona he nui:

MySpotAutoScalingGroup:
 Properties:
   HealthCheckGracePeriod: 300
   HealthCheckType: EC2
   MixedInstancesPolicy:
     InstancesDistribution:
       OnDemandPercentageAboveBaseCapacity: 0
       SpotAllocationStrategy: capacity-optimized
     LaunchTemplate:
       LaunchTemplateSpecification:
         LaunchTemplateId: !Ref LaunchTemplate
         Version: !GetAtt LaunchTemplate.LatestVersionNumber
       Overrides:
         - InstanceType: "m4.2xlarge"
         - InstanceType: "m4.4xlarge"
         - InstanceType: "m5.2xlarge"
         - InstanceType: "m5.4xlarge"
         - InstanceType: "r4.2xlarge"
         - InstanceType: "r4.4xlarge"
   LaunchTemplate:
     LaunchTemplateId: !Ref LaunchTemplate
     Version: !GetAtt LaunchTemplate.LatestVersionNumber
   MinSize: 0
   MaxSize: 100
   Tags:
   - Key: k8s.io/cluster-autoscaler/node-template/label/aws.amazon.com/spot
     PropagateAtLaunch: true
     Value: "true"

ʻO kekahi mau memo no ka hoʻohana ʻana iā Spot me Kubernetes:

  • Pono ʻoe e mālama i ka hoʻopau ʻana o Spot, no ka laʻana ma ka hoʻohui ʻana i ka node i ka wā i pau ai ka hihia
  • Hoʻohana ʻo Zalando lāʻau ka hoʻokolo ʻana o ka pūʻulu ʻokoʻa me nā mea nui node pool
  • Nā kiko kiko hiki ke koi e ʻae i nā "hoʻopaʻa inoa" o nā hana hana e holo ma Spot

Hōʻuluʻulu

Manaʻo wau e ʻike ʻoe i kekahi o nā mea hana i hōʻike ʻia he mea pono i ka hōʻemi ʻana i kāu bila ao. Hiki iā ʻoe ke ʻike i ka hapa nui o ka ʻatikala ma kaʻu kamaʻilio ʻana ma DevOps Gathering 2019 ma YouTube a me nā kiʻi paheʻe.

He aha kāu mau hana maikaʻi loa no ka mālama ʻana i nā kumukūʻai ao ma Kubernetes? E ʻoluʻolu e haʻi mai iaʻu ma Twitter (@hoao_koe_).

[1] ʻO ka ʻoiaʻiʻo, ʻoi aku ka liʻiliʻi o 3 vCPU e hoʻohana ʻia no ka mea e hoʻemi ʻia ka throughput o ka node e nā kumuwaiwai pūnaewele i mālama ʻia. Hoʻokaʻawale ʻo Kubernetes ma waena o ka mana node kino a me nā kumuwaiwai "i hoʻolako ʻia" (Node Allocatable).

[2] Ka laʻana helu helu: hoʻokahi m5.large large me 8 GiB o ka hoʻomanaʻo he ~$84 ​​​​i kēlā me kēia mahina (eu-central-1, On-Demand), i.e. ʻO ka pale ʻana i ka node 1/8 ma kahi o ~ $ 10 / mahina.

[3] Nui aʻe nā ala e hōʻemi ai i kāu bila EC2, e like me Reserved Instances, Savings Plan, etc. - ʻAʻole wau e uhi i kēlā mau kumuhana ma aneʻi, akā pono ʻoe e nānā pono iā lākou!

E aʻo hou e pili ana i ka papa.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka