Creando addito kube-scheduler cum more paro of scheduling praecepta

Creando addito kube-scheduler cum more paro of scheduling praecepta

Kube-scheduler pars integralis Kubernetes est, quae ad siliquas trans nodi scheduling secundum certa consilia pertinet. Saepe, in operatione botri Kubernetes, non debemus cogitare de quibus technicis siliquis schedulis adhibentur, cum certa ratio agendi de defectu kube-schedule ad opera maxime quotidiana idonea sit. Sunt tamen condiciones, cum interest nobis ad processum subtilium siliquarum collocandi, et duplex est modus ad hoc negotium perficiendum;

  1. Creare kube-scheduler cum more paro of praecepta
  2. Tuam tabellarium scribe et doce illum laborare cum petitionibus API servo

In hoc articulo, primi puncti exsecutionem describemus ad problema solvendum inaequales focorum in inceptis nostris.

Brevis introductio ad quomodo opera kube-scheduler

Praesertim notandum quod kube-scheduler non est responsabilis siliquae directe scheduling - solum responsabile est nodi determinandi quem vasculum collocandi. Aliis verbis, effectus operis kube-scheduler nomen nodi est, quem ad API server postulatio schedulingae revertitur, et ubi opus eius finitur.

Primum, kube-scheduler ponit elenchum nodis in quibus vasculum secundum praedicata consilia fieri potest. Deinde quilibet nodi ex hoc catalogo certos punctos recipit secundum priores rationes. Quam ob rem nodi maximus numerus punctorum est electus. Si nodi sunt qui idem maximum nomen habent, temere electus est. Elenchus et descriptio praedicatorum et prioritas (scoring) initis inveniri possunt documentum.

Description of forsit corpus

Quamvis magnus numerus uvarum Kubernetarum diversorum apud Nixys servatus sit, primum problema siliquae tantum recentius scheduling offendit, cum unus inceptis nostris necessarius est ad magnum numerum curriculorum periodicorum (~100 CronJob entium). Ad simpliciorem descriptionem problematis quam maxime, exemplum sumemus unum microservicium, intra quod munus cronum semel in momento emittitur, aliquod onus in CPU creans. Ad opus cron currendum, tres nodi cum notis absolute identicis partiti sunt (24 vCPUs in singulis).

Simul, impossibile est accurate dicere quousque CronJob exsequi voluerit, cum volumen input data perpetuo mutatur. In mediocris, in normali operatione kube-scheduli, singula nodi currunt instantiae 3-4 operis, quae ~20-30% oneris in CPU cuiusque nodi efficiunt;

Creando addito kube-scheduler cum more paro of scheduling praecepta

Quaestio ipsa est, quod interdum cron siliquae molis in una e tribus nodis taxari desiit. Hoc est, in aliquo puncto temporis, nullum vasculum unum e nodorum destinatum est, cum in aliis duobus nodis 6-8 exemplaribus negotii currebant, ~40-60% oneris CPU creandi:

Creando addito kube-scheduler cum more paro of scheduling praecepta

Problema repetitum est cum frequentia omnino temere et interdum connectitur cum momento nova versio codicis evoluta est.

Augendo kube-scheduler logging level to level 10 (-v=10), commemorare incepimus quot puncta singula nodi consecuta sunt per aestimationem processus. In normali consilio operationis, sequentia informationes in lignis videri poterant:

resource_allocation.go:78] cronjob-1574828880-mn7m4 -> Node03: BalancedResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1387 millicores 4161694720 memory bytes, score 9
resource_allocation.go:78] cronjob-1574828880-mn7m4 -> Node02: BalancedResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1347 millicores 4444810240 memory bytes, score 9
resource_allocation.go:78] cronjob-1574828880-mn7m4 -> Node03: LeastResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1387 millicores 4161694720 memory bytes, score 9
resource_allocation.go:78] cronjob-1574828880-mn7m4 -> Node01: BalancedResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1687 millicores 4790840320 memory bytes, score 9
resource_allocation.go:78] cronjob-1574828880-mn7m4 -> Node02: LeastResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1347 millicores 4444810240 memory bytes, score 9
resource_allocation.go:78] cronjob-1574828880-mn7m4 -> Node01: LeastResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1687 millicores 4790840320 memory bytes, score 9
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node01: NodeAffinityPriority, Score: (0)                                                                                       
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node02: NodeAffinityPriority, Score: (0)                                                                                       
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node03: NodeAffinityPriority, Score: (0)                                                                                       
interpod_affinity.go:237] cronjob-1574828880-mn7m4 -> Node01: InterPodAffinityPriority, Score: (0)                                                                                                        
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node01: TaintTolerationPriority, Score: (10)                                                                                   
interpod_affinity.go:237] cronjob-1574828880-mn7m4 -> Node02: InterPodAffinityPriority, Score: (0)                                                                                                        
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node02: TaintTolerationPriority, Score: (10)                                                                                   
selector_spreading.go:146] cronjob-1574828880-mn7m4 -> Node01: SelectorSpreadPriority, Score: (10)                                                                                                        
interpod_affinity.go:237] cronjob-1574828880-mn7m4 -> Node03: InterPodAffinityPriority, Score: (0)                                                                                                        
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node03: TaintTolerationPriority, Score: (10)                                                                                   
selector_spreading.go:146] cronjob-1574828880-mn7m4 -> Node02: SelectorSpreadPriority, Score: (10)                                                                                                        
selector_spreading.go:146] cronjob-1574828880-mn7m4 -> Node03: SelectorSpreadPriority, Score: (10)                                                                                                        
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node01: SelectorSpreadPriority, Score: (10)                                                                                    
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node02: SelectorSpreadPriority, Score: (10)                                                                                    
generic_scheduler.go:726] cronjob-1574828880-mn7m4_project-stage -> Node03: SelectorSpreadPriority, Score: (10)                                                                                    
generic_scheduler.go:781] Host Node01 => Score 100043                                                                                                                                                                        
generic_scheduler.go:781] Host Node02 => Score 100043                                                                                                                                                                        
generic_scheduler.go:781] Host Node03 => Score 100043

Illae. Informatio de lignis diiudicandis, singulae nodi notatae totidem punctis ultimis et temere unus ad rationem destinatus est. In tempore consiliorum problematum, omnia haec similia videbantur:

resource_allocation.go:78] cronjob-1574211360-bzfkr -> Node02: BalancedResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1587 millicores 4581125120 memory bytes, score 9
resource_allocation.go:78] cronjob-1574211360-bzfkr -> Node03: BalancedResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1087 millicores 3532549120 memory bytes, score 9
resource_allocation.go:78] cronjob-1574211360-bzfkr -> Node02: LeastResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1587 millicores 4581125120 memory bytes, score 9
resource_allocation.go:78] cronjob-1574211360-bzfkr -> Node01: BalancedResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 987 millicores 3322833920 memory bytes, score 9
resource_allocation.go:78] cronjob-1574211360-bzfkr -> Node01: LeastResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 987 millicores 3322833920 memory bytes, score 9 
resource_allocation.go:78] cronjob-1574211360-bzfkr -> Node03: LeastResourceAllocation, capacity 23900 millicores 67167186944 memory bytes, total request 1087 millicores 3532549120 memory bytes, score 9
interpod_affinity.go:237] cronjob-1574211360-bzfkr -> Node03: InterPodAffinityPriority, Score: (0)                                                                                                        
interpod_affinity.go:237] cronjob-1574211360-bzfkr -> Node02: InterPodAffinityPriority, Score: (0)                                                                                                        
interpod_affinity.go:237] cronjob-1574211360-bzfkr -> Node01: InterPodAffinityPriority, Score: (0)                                                                                                        
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node03: TaintTolerationPriority, Score: (10)                                                                                   
selector_spreading.go:146] cronjob-1574211360-bzfkr -> Node03: SelectorSpreadPriority, Score: (10)                                                                                                        
selector_spreading.go:146] cronjob-1574211360-bzfkr -> Node02: SelectorSpreadPriority, Score: (10)                                                                                                        
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node02: TaintTolerationPriority, Score: (10)                                                                                   
selector_spreading.go:146] cronjob-1574211360-bzfkr -> Node01: SelectorSpreadPriority, Score: (10)                                                                                                        
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node03: NodeAffinityPriority, Score: (0)                                                                                       
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node03: SelectorSpreadPriority, Score: (10)                                                                                    
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node02: SelectorSpreadPriority, Score: (10)                                                                                    
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node01: TaintTolerationPriority, Score: (10)                                                                                   
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node02: NodeAffinityPriority, Score: (0)                                                                                       
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node01: NodeAffinityPriority, Score: (0)                                                                                       
generic_scheduler.go:726] cronjob-1574211360-bzfkr_project-stage -> Node01: SelectorSpreadPriority, Score: (10)                                                                                    
generic_scheduler.go:781] Host Node03 => Score 100041                                                                                                                                                                        
generic_scheduler.go:781] Host Node02 => Score 100041                                                                                                                                                                        
generic_scheduler.go:781] Host Node01 => Score 100038

Ex quo videri potest unum e nodis paucioribus punctis postremis quam aliis laceratum, et ideo intentio facta est tantum duobus nodis qui score maximum laceratum est. Ita certo certi sumus persuasum habere quaestionem praecise in schedulatione leguminarum iacere.

Ulterior algorithmus ad problema solvendum nobis apparuit - ligna resolvere, intellige qua prioritate nodi puncta non acceperint et, si opus sit, rationes defaltis kube-schedulis accommodent. Hic tamen occurrit duabus difficultatibus significantibus;

  1. In gradu maxime colligationis (10), puncta tantum potiora consecuta reflectuntur. In superiori excerpe de lignis, videre potes quod pro omnibus potioribus in lignis reflexis, nodi totidem puncta in normalibus et problematis scheduling notant, sed eventus finalis in causa problematum consiliorum diversus est. Ita concludere possumus quod in aliquibus prioritatibus, scoringis "post scenas" occurrit, et nullo modo intelligimus, ad quam prioritatem nodi puncta non pervenerunt. Hanc quaestionem in speciali descripsimus exitus Repositorium Kubernetes in Github. In tempore scribendi responsio recepta est ab machinis quae subsidii colligationis in Kubernetes v1.15,1.16, 1.17 et XNUMX addita sunt.
  2. Non est facilis via ad cognoscendum quem certae rationes agendi kube-scheduler nunc operantur. Ita, in documentum hic index recensetur, sed notitias non continet de ponderibus specificis singulis initis priori attributis. Pondera videre potes vel rationes kube-scheduler tantum in emendare source codes.

Notatu dignum est quod semel memorare potuimus nodi puncta non accepisse secundum consilium ImageLocalityPrioritatis, quae nodi demonstrat, si iam imago necessaria est ad applicationem currendi. Hoc est, cum nova applicationis versio evoluta est, negotium cron in duobus nodis curritur, novam imaginem e registro navali illis accipiens, et sic duo nodi altiorem notae finalem ad tertiam recipiunt. .

Quemadmodum supra scripsimus, in lignis informationes de perpensatione ImaginisPoritatis Prioritatis non videmus, ita ad reprimendam nostram assumptionem imaginem cum nova applicationis in tertium nodi emissum proiecimus, postquam scheduling recte operata est. . Hoc ipsum ob consilium ImageLocalityPrioritatis problema schedulingorum admodum raro observatum est, saepius vero cum alio associatum est. Ob id quod singulas rationes technicorum in numero prioritatum kube-scheduler deminuere non potuimus, flexibilium leguminum schedulingarum rationibus opus habuimus.

DE PECCATO quaestio

Volumus solutionem problematis quam maxime specificae esse, id est, praecipuas res Kubernetes (hic defectionem kube-scheduler significamus) immutata manere debent. Noluimus in uno loco quaestionem solvere et in alio creare. Ita ad duas optiones solvendas venimus, quae in prooemio ad articulum nuntiabantur - addito schedula creando vel scribendo tuo. Praecipua necessitas ad operas cron schedulendas est onus aequaliter per tres nodos distribuere. Haec postulatio satisfieri potest consiliis kube-scheduler existentibus, ut problema solvendum nullum punctum in scribendo tuo schedulario tuo sit.

Instructiones ad creandos et instruendos addito kube-scheduler describuntur documentum. Attamen nobis visum est entitatem instruere non satis esse ad culpam tolerantiam in operatione talis servitutis criticae sicut kube-scheduler efficere, ita decrevimus novam kube-scheduler explicandam sicut Podex Static, quae directe monitori esset. by Kubelet. Ita sequentia requiruntur pro novo kube-schedule:

  1. Ministerium debet instrui sicut Podex Static in omnibus dominis botris
  2. Culpa tolerantia praeberi debet, si activum vasculum cum kube-scheduler est unavailable
  3. Praecipuum prioritas cum consilio debet esse numerus copiarum instrumentorum nodi (LeastRequestedPriority)

Exsequendam solutiones

Notatu dignum est quod omnia opera in Kubernetes v1.14.7 exequemur, quia Haec est versio quae in documento adhibita est. Incipiamus a manifesto nostro novo kube-scheduler scribendo. Annum manifestum (/etc/kubernetes/manifests/kube-scheduler.yaml) sumamus in fundamentum et ad hanc formam adducamus:

kind: Pod
metadata:
  labels:
    component: scheduler
    tier: control-plane
  name: kube-scheduler-cron
  namespace: kube-system
spec:
      containers:
      - command:
        - /usr/local/bin/kube-scheduler
        - --address=0.0.0.0
        - --port=10151
        - --secure-port=10159
        - --config=/etc/kubernetes/scheduler-custom.conf
        - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
        - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
        - --v=2
        image: gcr.io/google-containers/kube-scheduler:v1.14.7
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 8
          httpGet:
            host: 127.0.0.1
            path: /healthz
            port: 10151
            scheme: HTTP
          initialDelaySeconds: 15
          timeoutSeconds: 15
        name: kube-scheduler-cron-container
        resources:
          requests:
            cpu: '0.1'
        volumeMounts:
        - mountPath: /etc/kubernetes/scheduler.conf
          name: kube-config
          readOnly: true
        - mountPath: /etc/localtime
          name: localtime
          readOnly: true
        - mountPath: /etc/kubernetes/scheduler-custom.conf
          name: scheduler-config
          readOnly: true
        - mountPath: /etc/kubernetes/scheduler-custom-policy-config.json
          name: policy-config
          readOnly: true
      hostNetwork: true
      priorityClassName: system-cluster-critical
      volumes:
      - hostPath:
          path: /etc/kubernetes/scheduler.conf
          type: FileOrCreate
        name: kube-config
      - hostPath:
          path: /etc/localtime
        name: localtime
      - hostPath:
          path: /etc/kubernetes/scheduler-custom.conf
          type: FileOrCreate
        name: scheduler-config
      - hostPath:
          path: /etc/kubernetes/scheduler-custom-policy-config.json
          type: FileOrCreate
        name: policy-config

Breviter de pelagus mutationes:

  1. Mutavit nomen vasculi et vasis in kube-scheduler-cron
  2. Determinato usu portuum 10151 et 10159 prout optio definita est hostNetwork: true iisdemque portubus uti non possumus ac defalta kube-scheduler (10251 et 10259).
  3. Parametro config utendo, limam configurationem indicavimus, quo ministerium inchoari debet
  4. Configuratus escendens fasciculi configurationis (scheduler-custom.conf) et scheduling tabellae (scheduler-custom-policy-config.json) ab exercitu.

Noli oblivisci nostrum kube-schedulem iura simili defalta egere. Botrus munus suum edit;

kubectl edit clusterrole system:kube-scheduler

...
   resourceNames:
    - kube-scheduler
    - kube-scheduler-cron
...

Nunc fama de rebus in lima configuration et in schedulings tabularum ratione contineri debere:

  • Configurationis fasciculus (scheduler-custom.conf)
    Ad obtinendum configurationem defaltam kube-scheduler, modulo uti debes --write-config-to ex documentum. Configurationem indeinalem in tabella /etc/kubernetes/scheduler-custom.conf ponemus et ad sequentem formam redigamus:

apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
schedulerName: kube-scheduler-cron
bindTimeoutSeconds: 600
clientConnection:
  acceptContentTypes: ""
  burst: 100
  contentType: application/vnd.kubernetes.protobuf
  kubeconfig: /etc/kubernetes/scheduler.conf
  qps: 50
disablePreemption: false
enableContentionProfiling: false
enableProfiling: false
failureDomains: kubernetes.io/hostname,failure-domain.beta.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region
hardPodAffinitySymmetricWeight: 1
healthzBindAddress: 0.0.0.0:10151
leaderElection:
  leaderElect: true
  leaseDuration: 15s
  lockObjectName: kube-scheduler-cron
  lockObjectNamespace: kube-system
  renewDeadline: 10s
  resourceLock: endpoints
  retryPeriod: 2s
metricsBindAddress: 0.0.0.0:10151
percentageOfNodesToScore: 0
algorithmSource:
   policy:
     file:
       path: "/etc/kubernetes/scheduler-custom-policy-config.json"

Breviter de pelagus mutationes:

  1. cedulam nomine nostro kube-scheduler-cron constituimus.
  2. In parametri lockObjectName debes etiam tibi nomen servitutis nostrae imponere et fac ut modulum leaderElect verum pone (si nodo unum dominum habeas, id falsum esse potes).
  3. Iter tabellae cum descriptione scheduling agendi in parametri algorithmSource.

Propius inspicere in secundo loco valet, ubi parametros pro clavis emendamus leaderElection. Ut culpa patientiae nos efficere possimus (leaderElect) processus eligendi ducem (dominum) inter siliquas nostrorum kube-schedularum utendo uno termino pro illis (resourceLock) nominatur kube-scheduler-cron (lockObjectName) In kube-ratio spatii nominali (lockObjectNamespace). Quomodo Kubernetes efficit promptitudinem principalium partium (including kube-scheduler) inveniri potest in articulus.

  • Fasciculus scheduling policy (scheduler-custom-policy-config.json)
    Sicut supra scripsimus, invenire possumus quae certa consilia default kube-scheduler operata sit solum eius codicem resolvendo. Hoc est, limam schedulingarum rationibus pro defectu kube-schedulis obtinere non possumus eodem modo ac fasciculi configurationis. Describamus rationes schedulingarum quas interest in lima /etc/kubernetes/scheduler-custom-policy-config.json haec:

{
  "kind": "Policy",
  "apiVersion": "v1",
  "predicates": [
    {
      "name": "GeneralPredicates"
    }
  ],
  "priorities": [
    {
      "name": "ServiceSpreadingPriority",
      "weight": 1
    },
    {
      "name": "EqualPriority",
      "weight": 1
    },
    {
      "name": "LeastRequestedPriority",
      "weight": 1
    },
    {
      "name": "NodePreferAvoidPodsPriority",
      "weight": 10000
    },
    {
      "name": "NodeAffinityPriority",
      "weight": 1
    }
  ],
  "hardPodAffinitySymmetricWeight" : 10,
  "alwaysCheckAllPredicates" : false
}

Ita kube-scheduler primum elenchum nodis componit quibus vasculum secundum consilium generale Predicatum accedi potest (in quo copia est PodFitsResources, PodFitsHostPorts, HostName, et MatchNodeSelector initis). Et tunc unusquisque nodi aestimatur secundum ordinem agendi in prioritatibus ordinatis. Ad condiciones operis nostri adimplendas, optimalem solutionem talem institutorum institutorum fore censebamus. Admoneam te ut copia agendi cum eorum descriptionibus in promptu sit documentum. Ad munus tuum implendum, certa ratione consiliorum adhibitorum mutare potes, eisque opportuna pondera assignare.

Manifestum appellabimus novi kube-scheduler, quem in initio capitis, kube-scheduler-custom.yaml creavimus, et in sequenti via /etc/kubernetes/manifestes tribus nodis magistris ponemus. Si omnia recte aguntur, Kubelet vasculum in singulas nodi mittet, et in lignis novi kube-scheduler nostri informationes videbimus quod tabella nostra feliciter applicata est:

Creating scheduler from configuration: {{ } [{GeneralPredicates <nil>}] [{ServiceSpreadingPriority 1 <nil>} {EqualPriority 1 <nil>} {LeastRequestedPriority 1 <nil>} {NodePreferAvoidPodsPriority 10000 <nil>} {NodeAffinityPriority 1 <nil>}] [] 10 false}
Registering predicate: GeneralPredicates
Predicate type GeneralPredicates already registered, reusing.
Registering priority: ServiceSpreadingPriority
Priority type ServiceSpreadingPriority already registered, reusing.
Registering priority: EqualPriority
Priority type EqualPriority already registered, reusing.
Registering priority: LeastRequestedPriority
Priority type LeastRequestedPriority already registered, reusing.
Registering priority: NodePreferAvoidPodsPriority
Priority type NodePreferAvoidPodsPriority already registered, reusing.
Registering priority: NodeAffinityPriority
Priority type NodeAffinityPriority already registered, reusing.
Creating scheduler with fit predicates 'map[GeneralPredicates:{}]' and priority functions 'map[EqualPriority:{} LeastRequestedPriority:{} NodeAffinityPriority:{} NodePreferAvoidPodsPriority:{} ServiceSpreadingPriority:{}]'

Restat autem omnia, ut in CronJob nostro spectu indicemus, quod omnes petitiones ad siliquas schedulendas a novo kube-schedule nostro procedendas esse debent;

...
 jobTemplate:
    spec:
      template:
        spec:
          schedulerName: kube-scheduler-cron
...

conclusio,

Denique additamentum kube-scheduler assecuti sumus cum singulari statuto consiliorum schedulingarum, cuius opus directe a kubelet est monitoratum. Praeterea electionem novi ducis inter siliquas kube-schedulae nostri statuimus, si vetus dux aliqua de causa perpendat.

Regulares applicationes et officia per defaltam kube-scheduler accedant, et omnia opera cronaria ad novum totum translata sunt. Onus a cron munerum conditum per omnes nodos nunc aequaliter distribuitur. Cum plurimae functiones Cronae in isdem nodis ac applicationes principales rei efficiantur, hoc periculum significanter siliquae movendi propter inopiam facultatum redegit. His additis kube-scheduler introductis, difficultates inaequales scheduling operum cronicarum amplius non surrexerunt.

Legunt etiam alia capitula in nostro diario:

Source: www.habr.com

Add a comment