Uyenza njani i-autoscaler yakho yeqela

Mholo! Siqeqesha abantu ukuba basebenze ngedatha enkulu. Akunakwenzeka ukucinga ngenkqubo yemfundo kwidatha enkulu ngaphandle kweqela layo, apho bonke abathathi-nxaxheba basebenza kunye. Ngesi sizathu, inkqubo yethu ihlala inayo :) Sizixakekise kubumbeko, ukulungiswa kunye nolawulo, kwaye abafana baqalise ngokuthe ngqo imephu Nciphisa imisebenzi apho kwaye usebenzise iSpark.

Kule posi siza kukuxelela ukuba siyisombulule njani ingxaki yokulayisha kweqela elingalinganiyo ngokubhala eyethu i-autoscaler sisebenzisa ilifu. Mail.ru Cloud Solutions.

Ingxaki

Iqela lethu alisetyenziswanga kwindlela eqhelekileyo. Ukulahlwa kungalingani kakhulu. Ngokomzekelo, kukho iiklasi ezenziwayo, xa bonke abantu abangama-30 kunye nomfundisi-ntsapho besiya kwiqela baze baqalise ukulisebenzisa. Okanye kwakhona, kukho iintsuku ngaphambi komhla wokugqibela xa umthwalo unyuka kakhulu. Ixesha eliseleyo iqela lisebenza kwimodi yokulayisha ngaphantsi.

Isisombululo #1 kukugcina iqela eliya kumelana nemithwalo ephezulu, kodwa liya kuhlala lingenzi nto lonke ixesha.

Isisombululo #2 kukugcina iqela elincinci, apho udibanisa khona ii-nodes phambi kweeklasi kunye nexesha lomthwalo ophezulu.

Isisombululo #3 kukugcina i-cluster encinci kwaye ubhale i-autoscaler eya kubeka iliso kumthwalo wangoku weqela kwaye, usebenzisa ii-APIs ezahlukeneyo, yongeza kwaye ususe ii-nodes kwi-cluster.

Kule post siza kuthetha ngesisombululo #3. Le autoscaler ixhomekeke kakhulu kwizinto zangaphandle kunezangaphakathi, kwaye ababoneleli bahlala bengaboneleli. Sisebenzisa i-Mail.ru Cloud Solutions cloud infrastructure kwaye sabhala i-autoscaler usebenzisa i-MCS API. Kwaye ekubeni sifundisa indlela yokusebenza ngedatha, sigqibe kwelokuba sibonise indlela yokubhala i-autoscaler efanayo ngeenjongo zakho kwaye uyisebenzise ngelifu lakho.

Mfuneko

Okokuqala, kufuneka ube neqela leHadoop. Ngokomzekelo, sisebenzisa ukuhanjiswa kwe-HDP.

Ukuze ii-nodes zakho zongezwe ngokukhawuleza kwaye zisuswe, kufuneka ube nokusabalalisa okuthile kweendima phakathi kweengqungquthela.

  1. I-master node. Ewe, akukho nto ifunekayo ngokukhethekileyo ukucacisa apha: i-node ephambili yeqela, apho, ngokomzekelo, umqhubi we-Spark uqaliswe, ukuba usebenzisa imo esebenzayo.
  2. Indawo yomhla. Le node ogcina kuyo idatha kwi-HDFS nalapho ubalo lwenzeka khona.
  3. Indawo yokubala. Le yindawo apho ungagcini nantoni na kwi-HDFS, kodwa apho ubalo lwenzeka khona.

Ingongoma ebalulekileyo. I-autoscaling iya kwenzeka ngenxa yee-nodes zodidi lwesithathu. Ukuba uqala ukuthatha kunye nokongeza ii-nodes zodidi lwesibini, isantya sokuphendula siya kuba siphantsi kakhulu - ukuchithwa kunye nokubuyisela kwakhona kuya kuthatha iiyure kwiqela lakho. Oku, ewe, ayisiyiyo into oyilindeleyo kwi-autoscaling. Oko kukuthi, asichukumisi ii-nodes zodidi lokuqala nolwesibini. Ziya kumela ubuncinane beqela elisebenzayo eliya kubakho kulo lonke ixesha leprogram.

Ke, i-autoscaler yethu ibhalwe kwiPython 3, isebenzisa i-Ambari API ukulawula iinkonzo zeqela, isebenzisa I-API evela kwi-Mail.ru Cloud Solutions (MCS) yokuqalisa kunye nokumisa oomatshini.

Uyilo lwesisombululo

  1. Imodyuli autoscaler.py. Iqulethe iiklasi ezintathu: 1) imisebenzi yokusebenza kunye ne-Ambari, i-2) imisebenzi yokusebenza kunye ne-MCS, i-3) imisebenzi ehambelana ngokuthe ngqo kwi-logic ye-autoscaler.
  2. Ushicilelo observer.py. Eyona nto iqulethwe yimithetho eyahlukeneyo: nini kwaye ngawaphi amaxesha ukubiza imisebenzi ye-autoscaler.
  3. Ifayile yoqwalaselo config.py. Iqulethe, umzekelo, uluhlu lweenodi ezivunyelweyo kwi-autoscaling kunye nezinye iiparitha ezichaphazelayo, umzekelo, ixesha elingakanani ukulinda ukususela kumzuzu wongezwa i-node entsha. Kukwakho nezitampu zexesha zokuqalisa kweeklasi, ukuze phambi kweklasi kuqaliswe uqwalaselo oluphezulu oluvunyelweyo lweqela.

Ngoku makhe sijonge amaqhekeza ekhowudi ngaphakathi kweefayile ezimbini zokuqala.

1. Imodyuli ye-Autoscaler.py

iklasi Ambari

Le yindlela ikhowudi equlathe udidi ibonakala ngayo Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

Ngaphezulu, njengomzekelo, unokujonga ukuphunyezwa komsebenzi stop_all_services, emisa zonke iinkonzo kwindawo efunwayo yeqela.

Ekungeneni eklasini Ambari uyapasa:

  • ambari_url, umzekelo, njenge 'http://localhost:8080/api/v1/clusters/',
  • cluster_name -igama leqela lakho e-Ambari,
  • headers = {'X-Requested-By': 'ambari'}
  • nangaphakathi auth nali igama lakho lokungena kunye negama lokugqitha le-Ambari: auth = ('login', 'password').

Umsebenzi ngokwawo awukho ngaphezu kweefowuni ezimbalwa nge-REST API ukuya e-Ambari. Ukusuka kwindawo enengqiqo, siqala ukufumana uluhlu lweenkonzo ezisebenzayo kwi-node, kwaye emva koko sibuze kwiqela elinikiweyo, kwindawo enikiweyo, ukudlulisa iinkonzo ukusuka kuluhlu ukuya kurhulumente. INSTALLED. Imisebenzi yokuqalisa zonke iinkonzo, zokudlulisa iindawo ukuya kurhulumente Maintenance njl. zifana ngokufanayo - zizicelo ezimbalwa nge-API.

Iklasi Mcs

Le yindlela ikhowudi equlathe udidi ibonakala ngayo Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

Ekungeneni eklasini Mcs sidlula i-id yeprojekthi ngaphakathi kwilifu kunye ne-id yomsebenzisi, kunye negama eliyimfihlo. Kumsebenzi vm_turn_on sifuna ukuvula omnye woomatshini. Ingqiqo apha inzima ngakumbi. Ekuqaleni kwekhowudi, eminye imisebenzi emithathu ibizwa: 1) kufuneka sifumane uphawu, 2) kufuneka siguqule igama lomninimzi kwigama lomatshini kwi-MCS, 3) fumana i-id yalo matshini. Emva koko, senza isicelo seposi kwaye siqalise lo matshini.

Nantsi indlela umsebenzi wokufumana ithokheni ujongeka ngayo:

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

Iklasi ye-Autoscaler

Olu didi luqulethe imisebenzi enxulumene nengqiqo yokusebenza ngokwayo.

Nantsi indlela ikhowudi yale klasi ibonakala ngayo:

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

Samkela iiklasi ukuze singene. Ambari ΠΈ Mcs, uluhlu lweenodi ezivunyelwe ukukala, kunye neeparitha zokucwangcisa i-node: imemori kunye ne-cpu eyabelwe i-node kwi-YARN. Kukwakho neeparamitha ezi-2 zangaphakathi q_ram, q_cpu, eziyimigca. Ukuzisebenzisa, sigcina amaxabiso omthwalo weqela langoku. Ukuba sibona ukuba kwimizuzu emi-5 edlulileyo kuye kwakho umthwalo owandisiweyo ngokuqhubekayo, ngoko sithatha isigqibo sokuba kufuneka songeze i-+1 node kwiqela. Kukwanjalo nakwilizwe leqela elingasetyenziswanga kakuhle.

Ikhowudi engentla ngumzekelo womsebenzi osusa umatshini kwiqela kwaye uwumise efini. Okokuqala kukho ukupheliswa YARN Nodemanager, emva koko imowudi ivula Maintenance, emva koko simisa zonke iinkonzo kumatshini kwaye sicime umatshini obonakalayo efini.

2. Umkhangeli weskripthi.py

Isampuli yekhowudi apho:

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

Kuyo, sijonga ukuba iimeko zenziwe zokwandisa umthamo weqela kunye nokuba kukho oomatshini abagciniweyo, fumana igama lomninimzi omnye wabo, wongeze kwiqela kwaye upapashe umyalezo malunga nayo kwiSlack yeqela lethu. Emva koko iqala cooldown_period, xa singafaki okanye sisuse nantoni na kwiqela, kodwa ngokulula ukubeka iliso umthwalo. Ukuba izinzile kwaye ingaphakathi kwepaseji yamaxabiso awona mthwalo, ngoko siyaqhubeka nokubeka iliso. Ukuba enye i-node yayinganelanga, ngoko songeza enye.

Kwiimeko xa sinesifundo esingaphambili, sele siyazi ngokuqinisekileyo ukuba enye i-node ayiyi kuba yanele, ngoko siqala ngokukhawuleza zonke iinqununu zamahhala kwaye sizigcine zisebenza de kube sekupheleni kwesifundo. Oku kwenzeka kusetyenziswa uluhlu lwezitampu zexesha lomsebenzi.

isiphelo

I-Autoscaler sisisombululo esilungileyo nesifanelekileyo kwezo meko xa ufumana ukulayishwa kweqela elingalinganiyo. Kwangaxeshanye ufezekisa uqwalaselo olufunwayo lweqela lomthwalo ophakamileyo kwaye kwangaxeshanye ungagcini eli qela ngexesha lomthwalo ophantsi, ugcine imali. Ewe, konke oku kuyenzeka ngokuzenzekelayo ngaphandle kokuthatha inxaxheba kwakho. I-autoscaler ngokwayo ayikho into engaphezulu kweseti yezicelo kumphathi we-cluster API kunye ne-API yomboneleli wefu, ebhaliweyo ngokwengqiqo ethile. Into ekufuneka uyikhumbule ngokuqinisekileyo kukwahlulwa kweendawo ezi-3 kwiindidi ezi-XNUMX, njengoko sibhale ngaphambili. Kwaye uya kuvuya.

umthombo: www.habr.com

Yongeza izimvo