Txuag rau Kubernetes huab nqi ntawm AWS

Kev txhais lus ntawm tsab xov xwm tau npaj rau hnub ua ntej ntawm kev pib kawm "Infrastructure platform raws li Kubernetes".

Txuag rau Kubernetes huab nqi ntawm AWS

Yuav ua li cas txuag cov nqi huab thaum ua haujlwm nrog Kubernetes? Tsis muaj ib qho kev daws teeb meem, tab sis tsab xov xwm no piav qhia ntau yam cuab yeej uas tuaj yeem pab koj tswj koj cov peev txheej zoo dua thiab txo koj cov nqi huab cua.

Kuv tau sau tsab xov xwm no nrog Kubernetes rau AWS hauv siab, tab sis nws yuav siv tau (yuav luag) raws nraim tib txoj kev rau lwm tus neeg muab kev pabcuam huab. Kuv xav tias koj pawg (cov) twb muaj autoscaling configured (pawg-autoscaler). Tshem tawm cov peev txheej thiab txo qis koj qhov kev xa tawm yuav tsuas txuag koj cov nyiaj yog tias nws tseem txo koj lub nkoj ntawm cov neeg ua haujlwm (xws li EC2).

Kab lus no yuav npog:

Ntxuav cov khoom siv tsis siv

Ua hauj lwm hauv ib puag ncig ceev ceev yog qhov zoo. Peb xav tau cov koom haum tech nrawm. Kev xa cov software sai dua kuj txhais tau tias ntau dua PR kev xa tawm, saib ua ntej ib puag ncig, qauv qauv, thiab cov kev daws teeb meem analytics. Txhua yam yog xa mus rau Kubernetes. Leej twg muaj sijhawm los ntxuav cov kev sim ua kom tiav? Nws yog ib qho yooj yim kom tsis nco qab txog kev rho tawm ib lub lim tiam kev sim. Tus nqi huab yuav xaus nce vim qee yam peb tsis nco qab kaw:

Txuag rau Kubernetes huab nqi ntawm AWS

(Henning Jacobs:
Tswv yim:
(quotes) Corey Quinn:
Myth: Koj tus account AWS yog qhov ua haujlwm ntawm tus lej ntawm cov neeg siv koj muaj.
Qhov tseeb: Koj qhov qhab nia AWS yog qhov ua haujlwm ntawm tus lej ntawm cov engineers koj muaj.

Ivan Kurnosov (nyob rau hauv teb):
Qhov tseeb tiag: Koj qhov qhab nia AWS yog qhov ua haujlwm ntawm cov khoom uas koj tsis nco qab ua haujlwm / rho tawm.)

Kubernetes Janitor (kube-janitor) pab ntxuav koj pawg. Tus neeg saib xyuas kev teeb tsa yog hloov tau rau kev siv thoob ntiaj teb thiab hauv zos:

  • Cov kev cai thoob plaws pawg tuaj yeem txhais lub sijhawm ntev tshaj plaws-rau-nyob (TTL) rau PR / xeem xa tawm.
  • Cov peev txheej ntawm tus kheej tuaj yeem sau tseg nrog tus neeg saib xyuas / ttl, piv txwv li kom tshem tawm cov ntsia hlau loj / qauv tom qab 7 hnub.

Cov kev cai dav dav tau teev tseg hauv YAML cov ntaub ntawv. Nws txoj kev yog dhau los ntawm parameter --rules-file hauv kube-janitor. Ntawm no yog ib qho piv txwv txoj cai kom tshem tawm tag nrho cov namespaces nrog -pr- nyob rau hauv lub npe tom qab ob hnub:

- id: cleanup-resources-from-pull-requests
  resources:
    - namespaces
  jmespath: "contains(metadata.name, '-pr-')"
  ttl: 2d

Cov piv txwv hauv qab no tswj hwm kev siv daim ntawv thov daim ntawv lo rau ntawm Kev Tshaj Tawm thiab StatefulSet pods rau tag nrho cov kev xa tawm tshiab / StatefulSets hauv 2020, tab sis tib lub sijhawm tso cai rau kev ua tiav cov kev xeem yam tsis muaj daim ntawv lo rau ib lub lis piam:

- id: require-application-label
  # ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒ deployments ΠΈ statefulsets Π±Π΅Π· ΠΌΠ΅Ρ‚ΠΊΠΈ "application"
  resources:
    - deployments
    - statefulsets
  # см. http://jmespath.org/specification.html
  jmespath: "!(spec.template.metadata.labels.application) && metadata.creationTimestamp > '2020-01-01'"
  ttl: 7d

Khiav lub sijhawm txwv tsis pub dhau 30 feeb ntawm pawg khiav kube-janitor:

kubectl run nginx-demo --image=nginx
kubectl annotate deploy nginx-demo janitor/ttl=30m

Lwm qhov ntawm cov nqi nce ntxiv yog cov ntim tsis tu ncua (AWS EBS). Rho tawm Kubernetes StatefulSet tsis rho tawm nws cov ntim tsis tu ncua (PVC - PersistentVolumeClaim). Tsis siv EBS ntim tuaj yeem ua rau cov nqi ntau pua daus las hauv ib hlis. Kubernetes Janitor muaj qhov tshwj xeeb los ntxuav cov PVC uas tsis siv. Piv txwv li, txoj cai no yuav tshem tawm tag nrho cov PVCs uas tsis tau mounted los ntawm ib tug module thiab tsis hais los ntawm ib tug StatefulSet los yog CronJob:

# ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒ всС PVC, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π½Π΅ смонтированы ΠΈ Π½Π° ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π½Π΅ ΡΡΡ‹Π»Π°ΡŽΡ‚ΡΡ StatefulSets
- id: remove-unused-pvcs
  resources:
  - persistentvolumeclaims
  jmespath: "_context.pvc_is_not_mounted && _context.pvc_is_not_referenced"
  ttl: 24h

Kubernetes Janitor tuaj yeem pab koj ua kom koj pawg huv si thiab tiv thaiv huab huab cov nqi los ntawm kev maj mam nce. Rau kev xa tawm thiab teeb tsa cov lus qhia, ua raws README kube-janitor.

Txo qhov ntsuas qhov ntsuas thaum tsis ua haujlwm

Kev sim thiab kev ua haujlwm feem ntau yuav tsum tau ua haujlwm tsuas yog thaum lub sijhawm ua haujlwm. Qee daim ntawv thov kev tsim khoom, xws li cov cuab yeej rov qab / chaw ua haujlwm, kuj tseem xav tau tsuas yog muaj tsawg thiab tej zaum yuav raug kaw thaum hmo ntuj.

Kubernetes Downscaler (kube-downscaler) tso cai rau cov neeg siv thiab cov neeg ua haujlwm kom ntsuas lub kaw lus thaum lub sijhawm tsis ua haujlwm. Deployments thiab StatefulSets tuaj yeem ntsuas rau xoom replicas. CronJobs tuaj yeem raug ncua. Kubernetes Downscaler tau teeb tsa rau tag nrho pawg, ib lossis ntau lub npe, lossis cov peev txheej ntawm tus kheej. Koj tuaj yeem teem sijhawm "tsis ua haujlwm" lossis, hloov pauv, "lub sijhawm ua haujlwm". Piv txwv li, txhawm rau txo qhov ntsuas kom ntau li ntau tau thaum hmo ntuj thiab hnub so:

image: hjacobs/kube-downscaler:20.4.3
args:
  - --interval=30
  # Π½Π΅ ΠΎΡ‚ΠΊΠ»ΡŽΡ‡Π°Ρ‚ΡŒ ΠΊΠΎΠΌΠΏΠΎΠ½Π΅Π½Ρ‚Ρ‹ инфраструктуры
  - --exclude-namespaces=kube-system,infra
  # Π½Π΅ ΠΎΡ‚ΠΊΠ»ΡŽΡ‡Π°Ρ‚ΡŒ kube-downscaler, Π° Ρ‚Π°ΠΊΠΆΠ΅ ΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒ Postgres Operator, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΈΡΠΊΠ»ΡŽΡ‡Π΅Π½Π½Ρ‹ΠΌΠΈ Π‘Π” ΠΌΠΎΠΆΠ½ΠΎ Π±Ρ‹Π»ΠΎ ΡƒΠΏΡ€Π°Π²Π»ΡΡ‚ΡŒ
  - --exclude-deployments=kube-downscaler,postgres-operator
  - --default-uptime=Mon-Fri 08:00-20:00 Europe/Berlin
  - --include-resources=deployments,statefulsets,stacks,cronjobs
  - --deployment-time-annotation=deployment-time

Ntawm no yog ib daim duab rau scaling pawg neeg ua hauj lwm nodes nyob rau hnub so:

Txuag rau Kubernetes huab nqi ntawm AWS

Scaling nqis los ntawm ~ 13 mus rau 4 tus neeg ua haujlwm nodes yeej ua rau pom qhov txawv ntawm koj daim nqi AWS.

Tab sis yuav ua li cas yog tias kuv yuav tsum tau ua hauj lwm thaum lub sij hawm pawg "downtime"? Qee qhov kev xa tawm tuaj yeem raug tshem tawm mus tas li los ntawm kev ntsuas los ntawm kev ntxiv cov downscaler/exclude: tseeb annotation. Kev xa tawm tuaj yeem raug cais tawm ib ntus uas siv tus nqi qis / cais tawm-kom txog thaum cov lus piav qhia nrog lub sijhawm ua tiav hauv hom YYYY-MM-DD HH: MM (UTC). Yog tias tsim nyog, tag nrho pawg tuaj yeem ntsuas rov qab los ntawm kev xa cov pod nrog cov lus piav qhia downscaler/force-uptime, piv txwv li, los ntawm launching nginx blank:

kubectl run scale-up --image=nginx
kubectl annotate deploy scale-up janitor/ttl=1h # ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒ Ρ€Π°Π·Π²Π΅Ρ€Ρ‚Ρ‹Π²Π°Π½ΠΈΠ΅ Ρ‡Π΅Ρ€Π΅Π· час
kubectl annotate pod $(kubectl get pod -l run=scale-up -o jsonpath="{.items[0].metadata.name}") downscaler/force-uptime=true

Saib README kube-downscaler, yog tias koj txaus siab rau kev xa tawm cov lus qhia thiab kev xaiv ntxiv.

Siv kab rov tav autoscaling

Ntau daim ntawv thov / kev pabcuam cuam tshuam nrog cov qauv thauj khoom dynamic: qee zaum lawv cov modules tsis ua haujlwm, thiab qee zaum lawv ua haujlwm tag nrho. Kev khiav hauj lwm lub nkoj mus tas li ntawm cov pods los tiv thaiv qhov siab tshaj plaws load tsis yog kev lag luam. Kubernetes txhawb nqa kab rov tav auto-scaling hla ib qhov chaw Kab rov tav PodAutoscaler (HPA). Kev siv CPU feem ntau yog qhov taw qhia zoo rau kev ntsuas:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        averageUtilization: 100
        type: Utilization

Zalando tau tsim ib qho kev tivthaiv kom yooj yim txuas cov kev cai metrics rau scaling: Kube Metrics Adapter (kube-metrics-adapter) yog ib qho generic metrics adapter rau Kubernetes uas tuaj yeem sau thiab ua haujlwm rau kev cai thiab lwm qhov kev ntsuas rau kab rov tav autoscaling ntawm pods. Nws txhawb kev ntsuas raws li Prometheus metrics, SQS queues, thiab lwm qhov chaw. Piv txwv li, txhawm rau ntsuas koj qhov kev xa mus rau qhov kev cai metric uas sawv cev los ntawm daim ntawv thov nws tus kheej li JSON hauv / metrics siv:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  annotations:
    # metric-config.<metricType>.<metricName>.<collectorName>/<configKey>
    metric-config.pods.requests-per-second.json-path/json-key: "$.http_server.rps"
    metric-config.pods.requests-per-second.json-path/path: /metrics
    metric-config.pods.requests-per-second.json-path/port: "9090"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        averageValue: 1k
        type: AverageValue

Configuring kab rov tav autoscaling nrog HPA yuav tsum yog ib qho ntawm cov kev ua tsis tau zoo los txhim kho kev ua tau zoo rau cov kev pabcuam tsis muaj neeg nyob. Spotify muaj kev nthuav qhia nrog lawv cov kev paub dhau los thiab cov lus pom zoo rau HPA: scale koj deployments, tsis yog koj lub hnab nyiaj.

Txo cov peev txheej overbooking

Kubernetes workloads txiav txim siab lawv CPU / nco xav tau los ntawm "kev thov kev pab." CPU cov peev txheej raug ntsuas hauv virtual cores lossis ntau dua hauv "millicores", piv txwv li 500m txhais tau tias 50% vCPU. Cov peev txheej nco tau raug ntsuas hauv bytes, thiab cov lus kawg tuaj yeem siv tau, xws li 500Mi, uas txhais tau tias 500 megabytes. Cov peev txheej thov "xauv" muaj peev xwm ntawm cov neeg ua haujlwm ntawm cov nodes, txhais tau tias lub plhaub taum nrog 1000m CPU thov ntawm ib lub node nrog 4 vCPUs yuav tawm tsuas yog 3 vCPUs muaj rau lwm cov pods. [1]

Slack (ntau tshaj reserves) yog qhov sib txawv ntawm kev thov cov peev txheej thiab kev siv tiag tiag. Piv txwv li, lub pod uas thov 2 GiB ntawm lub cim xeeb tab sis tsuas yog siv 200 MiB muaj ~ 1,8 GiB ntawm "ntau" nco. Cov nyiaj ntau dhau. Ib tus tuaj yeem kwv yees kwv yees tias 1 GiB ntawm kev nco tsis raug nqi ~ $ 10 toj ib hlis. [2]

Kubernetes Resource Report (kube-resource-report) qhia txog kev khaws cia ntau dhau thiab tuaj yeem pab koj txiav txim siab txog kev txuag nyiaj:

Txuag rau Kubernetes huab nqi ntawm AWS

Kubernetes Resource Report qhia tau hais tias ntau tshaj aggregated los ntawm daim ntawv thov thiab hais kom ua. Qhov no tso cai rau koj mus nrhiav qhov chaw uas cov kev xav tau tuaj yeem txo qis. Daim ntawv tshaj tawm HTML uas tau tsim tsuas yog muab ib qho snapshot ntawm kev siv peev txheej. Koj yuav tsum saib CPU / nco siv lub sijhawm los txiav txim siab cov peev txheej txaus. Nov yog Grafana daim ntawv qhia rau qhov "ib txwm" CPU-hnyav kev pabcuam: txhua lub pods siv tsawg dua li 3 thov CPU cores:

Txuag rau Kubernetes huab nqi ntawm AWS

Txo qhov kev thov CPU los ntawm 3000m mus rau ~ 400m tso cai rau cov khoom siv rau lwm yam haujlwm thiab tso cai rau pawg me me.

"Kev siv CPU nruab nrab ntawm EC2 feem ntau hovers nyob rau hauv ib tus lej feem pua," sau Corey Quinn. Thaum EC2 kwv yees qhov loj me yuav yog qhov kev txiav txim siab phemHloov qee cov lus nug Kubernetes hauv YAML cov ntaub ntawv yog ib qho yooj yim thiab tuaj yeem nqa nyiaj txuag loj.

Tab sis peb puas xav kom tib neeg hloov cov txiaj ntsig hauv YAML cov ntaub ntawv? Tsis yog, tshuab tuaj yeem ua tau zoo dua! Kubernetes Vertical Pod Autoscaler (VPA) ua li ntawd: hloov cov kev thov thiab kev txwv raws li kev ua haujlwm. Nov yog ib qho piv txwv ntawm Prometheus CPU thov (thin xiav kab) yoog los ntawm VPA dhau sijhawm:

Txuag rau Kubernetes huab nqi ntawm AWS

Zalando siv VPA hauv txhua pawg rau infrastructure Cheebtsam. Cov ntawv thov tsis tseem ceeb kuj tuaj yeem siv VPA.

goldilocks los ntawm Fairwind yog ib qho cuab yeej uas tsim VPA rau txhua qhov kev xa tawm hauv lub npe thiab tom qab ntawd qhia VPA cov lus pom zoo ntawm nws lub dashboard. Nws tuaj yeem pab cov neeg tsim khoom teeb tsa qhov tseeb CPU / nco thov rau lawv daim ntawv thov:

Txuag rau Kubernetes huab nqi ntawm AWS

Kuv sau me me blogpost txog VPA nyob rau hauv 2019, thiab tsis ntev los no nyob rau hauv CNCF End User Community tham txog VPA qhov teeb meem.

Siv EC2 Spot Instances

Qhov kawg tab sis tsis tsawg kawg, AWS EC2 cov nqi tuaj yeem raug txo los ntawm kev siv Spot piv txwv li Kubernetes tus neeg ua haujlwm nodes [3]. Cov xwm txheej muaj nyob rau ntawm qhov luv nqi txog li 90% piv rau On-Demand tus nqi. Khiav Kubernetes ntawm EC2 Spot yog qhov sib xyaw ua ke zoo: koj yuav tsum tau qhia ntau hom piv txwv sib txawv kom muaj ntau dua, txhais tau tias koj tuaj yeem tau txais lub pob loj dua rau tus nqi qub lossis qis dua, thiab lub peev xwm ntau ntxiv tuaj yeem siv los ntawm cov khoom ntim Kubernetes.

Yuav ua li cas khiav Kubernetes ntawm EC2 Spot? Muaj ntau ntau txoj kev xaiv: siv cov kev pabcuam thib peb xws li SpotInst (tam sim no hu ua "Spot", tsis txhob nug kuv vim li cas), lossis tsuas yog ntxiv Spot AutoScalingGroup (ASG) rau koj pawg. Piv txwv li, ntawm no yog CloudFormation snippet rau "capacity-optimized" Spot ASG nrog ntau hom piv txwv:

MySpotAutoScalingGroup:
 Properties:
   HealthCheckGracePeriod: 300
   HealthCheckType: EC2
   MixedInstancesPolicy:
     InstancesDistribution:
       OnDemandPercentageAboveBaseCapacity: 0
       SpotAllocationStrategy: capacity-optimized
     LaunchTemplate:
       LaunchTemplateSpecification:
         LaunchTemplateId: !Ref LaunchTemplate
         Version: !GetAtt LaunchTemplate.LatestVersionNumber
       Overrides:
         - InstanceType: "m4.2xlarge"
         - InstanceType: "m4.4xlarge"
         - InstanceType: "m5.2xlarge"
         - InstanceType: "m5.4xlarge"
         - InstanceType: "r4.2xlarge"
         - InstanceType: "r4.4xlarge"
   LaunchTemplate:
     LaunchTemplateId: !Ref LaunchTemplate
     Version: !GetAtt LaunchTemplate.LatestVersionNumber
   MinSize: 0
   MaxSize: 100
   Tags:
   - Key: k8s.io/cluster-autoscaler/node-template/label/aws.amazon.com/spot
     PropagateAtLaunch: true
     Value: "true"

Qee cov ntawv sau txog kev siv Spot nrog Kubernetes:

  • Koj yuav tsum tau ua haujlwm Spot terminations, piv txwv li los ntawm kev sib koom ua ke ntawm node thaum qhov piv txwv raug tso tseg
  • Zalando siv diav rawg official pawg autoscaling nrog node pas dej ua ntej
  • Cov nodes tuaj yeem yuam lees txais "sau npe" ntawm cov haujlwm ua haujlwm los khiav hauv Spot

Txoj kev xaus

Kuv vam tias koj yuav pom qee yam ntawm cov cuab yeej nthuav qhia muaj txiaj ntsig hauv kev txo koj cov nqi huab. Koj tuaj yeem pom feem ntau ntawm cov ntsiab lus tseem ceeb ntawm kuv tham ntawm DevOps Sib Sau 2019 hauv YouTube thiab hauv slides.

Dab tsi yog koj cov kev coj ua zoo tshaj plaws rau kev txuag huab nqi ntawm Kubernetes? Thov qhia rau kuv paub ntawm Twitter (@try_except_).

[1] Qhov tseeb, tsawg dua 3 vCPUs yuav nyob twj ywm siv tau raws li qhov kev nkag mus tau raug txo los ntawm cov peev txheej tshwj xeeb. Kubernetes qhov txawv ntawm lub cev muaj peev xwm ntawm lub cev thiab "kev npaj" cov peev txheej (Node Allocatable).

[2] Piv txwv suav: ib m5.loj piv txwv nrog 8 GiB ntawm lub cim xeeb yog ~ $ 84 ​​ib hlis (eu-central-1, On-Demand), i.e. thaiv 1/8 node yog kwv yees li ~ $ 10 / hli.

[3] Muaj ntau ntau txoj hauv kev los txo koj daim nqi EC2, xws li Reserved Instances, Savings Plan, thiab lwm yam.

Kawm ntxiv txog chav kawm.

Tau qhov twg los: www.hab.com

Ntxiv ib saib