Sida loo sameeyo autoscaler adiga kuu gaar ah koox

Hello! Waxaan u tababarnaa dadka inay ku shaqeeyaan xog weyn. Suurtagal maaha in la qiyaaso barnaamij waxbarasho oo ku saabsan xog weyn oo aan lahayn koox u gaar ah, kaas oo dhammaan kaqeybgalayaashu ay wada shaqeeyaan. Sababtan awgeed, barnaamijkayagu had iyo jeer wuu hayaa

Maqaalkan waxaan kuugu sheegi doonaa sida aan u xalinay dhibaatada kutlada aan sinnayn annagoo qorayna autoscaler annaga oo adeegsanayna daruuraha Mail.ru Cloud Solutions.

dhibaato

Kooxdayada looma isticmaalo qaab caadi ah. Tuurtu aad bay u sinnayn. Tusaale ahaan, waxaa jira fasallo wax ku ool ah, marka dhammaan 30 qof iyo macallin ay tagaan kooxda oo ay bilaabaan isticmaalkeeda. Ama mar labaad, waxaa jira maalmo ka hor wakhtiga kama dambaysta ah marka culeysku aad u kordho. Inta ka hartay kooxdu waxay ku shaqeysaa qaabka hoos u dhigista.

Xalka #1 waa in la ilaaliyo koox u adkeysan doonta culeyska ugu sarreeya, laakiin noqon doona mid aan shaqayn inta ka dhiman.

Xalka #2 waa in la ilaaliyo koox yar, taas oo aad gacanta ku darto noodhka ka hor fasallada iyo inta lagu jiro culeyska ugu sarreeya.

Xalka #3 waa in la hayo koox yar oo la qoro autoscaler kaas oo la socon doona culayska hadda ee kutlada iyo, iyadoo la adeegsanayo API-yo kala duwan, ku dar oo ka saar noonada kooxda.

Maqaalkan waxaan kaga hadli doonaa xalka #3. Qalabka autoscaler-ku wuxuu aad ugu tiirsan yahay arrimo dibadda ah halkii uu ka ahaan lahaa kuwa gudaha ah, iyo bixiyeyaasha inta badan ma bixiyaan. Waxaan isticmaalnaa kaabeyaasha daruuraha ee Mail.ru Cloud Solutions waxaanan ku qornay autoscaler annagoo isticmaalaya MCS API. Tan iyo markii aan bareyno sida loogu shaqeeyo xogta, waxaan go'aansanay inaan tusno sida aad u qori karto autoscaler la mid ah ujeedooyinkaaga oo aad u isticmaasho daruurahaaga

shuruudaha

Marka hore, waa inaad haysataa koox Hadoop ah. Tusaale ahaan, waxaan isticmaalnaa qaybinta HDP.

Si qanjidhadaadu si dhakhso ah loogu daro oo looga saaro, waa inaad leedahay qayb qaybin gaar ah oo doorarka noodhka ah.

  1. Master node. Hagaag, ma jiraan wax si gaar ah lagama maarmaan u ah in lagu sharaxo halkan: qanjidhada ugu weyn ee kooxda, taas oo, tusaale ahaan, darawalka Spark la bilaabay, haddii aad isticmaasho habka isdhexgalka.
  2. Taariikhda noodhka Kani waa noodhka aad ku kaydiso xogta HDFS iyo halka xisaabinta ay ka dhacayso.
  3. Koombuyuutareedka. Kani waa noode meesha aanad waxba ku kaydin HDFS, laakiin halka xisaabtu ka dhacdo.

Qodob muhiim ah. Autoscaling waxay ku dhici doontaa noodhka nooca saddexaad. Haddii aad bilowdo qaadashada iyo ku darista qanjidhada nooca labaad, xawaaraha jawaabtu aad ayuu u hooseeyaa - goynta iyo dib u celinta waxay qaadan doontaa saacado badan kooxdaada. Tani, dabcan, maahan waxa aad ka filayso autoscaling. Taasi waa, ma taaban noodhka noocyada koowaad iyo labaad. Waxay mateli doonaan kooxda ugu yar ee jiri karta oo jiri doonta inta barnaamijku socdo.

Markaa, autoscaler-kayagu waxa uu ku qoran yahay Python 3, waxa uu isticmaalaa Ambari API si uu u maareeyo adeegyada kooxda API ka Mail.ru Cloud Solutions (MCS) ee mashiinada bilowga iyo joojinta.

Qaab dhismeedka xalka

  1. Module autoscaler.py. Waxay ka kooban tahay saddex qaybood: 1) hawlaha la shaqaynta Ambari, 2) hawlaha la shaqaynta MCS, 3) hawlaha la xidhiidha si toos ah macquulka ah autoscaler.
  2. Qoraal observer.py. Asal ahaan waxa ay ka kooban tahay xeerar kala duwan: goorta iyo xilligee la wacayaa hawlaha autoscaler.
  3. Faylka qaabaynta config.py. Waxa ay ka kooban tahay, tusaale ahaan, liiska noodhka loo ogol yahay in lagu sameeyo autoscaling iyo xuduudaha kale ee saameeya, tusaale ahaan, inta la sugo laga bilaabo marka qanjirada cusub lagu daro. Waxa kale oo jira shaambada wakhtiyada bilowga fasalada, si ka hor inta aan fasalka la bilaabo qaabaynta kooxda ugu badan ee la ogolyahay.

Aynu hadda eegno qaybaha koodka ee ku jira labada fayl ee hore.

1. Autoscaler.py module

fasalka Ambari

Tani waa sida uu u eg yahay gabal kood ka kooban fasal Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

Xagga sare, tusaale ahaan, waxaad eegi kartaa hirgelinta shaqada stop_all_services, Kaas oo joojiya dhammaan adeegyada ku jira noodhka kooxda ee la rabo.

Halka laga soo galo fasalka Ambari waad gudubtaa:

  • ambari_url, tusaale ahaan, sida 'http://localhost:8080/api/v1/clusters/',
  • cluster_name - magaca kooxdaada ee Camri,
  • headers = {'X-Requested-By': 'ambari'}
  • iyo gudaha auth waa kan isticmaalahaaga iyo eraygaaga sirta ah ee Ambari: auth = ('login', 'password').

Shaqada lafteedu waa wax aan ka badnayn dhowr wicitaan oo loo maro REST API ilaa Ambari. Marka laga eego aragtida macquulka ah, waxaan marka hore ku helnaa liiska adeegyada socodsiinta ee ku yaala noodhka, ka dibna waxaan waydiisanaynaa koox la bixiyay, oo ku taal noodhka la bixiyay, si aan adeegyada uga wareejino liiska una gudubno gobolka. INSTALLED. Hawlaha bilaabista adeegyada oo dhan, u wareejinta noodhka gobolka Maintenance iwm. waxay u eg yihiin kuwo la mid ah - waa codsiyo yar oo API ah.

Fasalka Mcs

Tani waa sida uu u eg yahay gabal kood ka kooban fasal Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

Halka laga soo galo fasalka Mcs waxaan ku dhaafnaa id mashruuca gudaha daruuraha iyo id isticmaalaha, iyo sidoo kale erayga sirta ah. In shaqada vm_turn_on Waxaan rabnaa in aan shid mid ka mid ah mishiinnada. Macnaha halkan waa ka yara adag. Bilawga koodka, saddex hawlood oo kale ayaa loo yaqaan: 1) waxaan u baahanahay inaan helno calaamad, 2) waxaan u baahanahay inaan magaca martida loo beddelo magaca mashiinka ku jira MCS, 3) hel aqoonsiga mashiinkan. Marka xigta, waxaanu si fudud u samaynaa codsi boosta oo aanu bilownay mishiinka.

Tani waa sida ay u egtahay shaqada helitaanka calaamad:

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

Heerka Autoscaler

Fasalkani waxa uu ka kooban yahay hawlo la xidhiidha caqli-galnimada laftiisa.

Tani waa sida qayb kood ah oo fasalkani u eg yahay:

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

Waxaan aqbalnaa xiisado gelitaanka. Ambari ΠΈ Mcs, Liiska qanjidhada loo oggol yahay in la isku miiro, iyo sidoo kale cabbirrada qaabeynta noodhka: xusuusta iyo cpu loo qoondeeyay qanjidhada YARN. Waxa kale oo jira 2 cabbir gudaha ah q_ram, q_cpu, kuwaas oo saf ah. Iyaga oo isticmaalaya, waxaan ku kaydineynaa qiyamka culeyska kutlada hadda. Haddii aan aragno in 5tii daqiiqo ee ugu dambeysay uu si joogta ah u kordhay culeyska, markaas waxaan go'aansanay inaan u baahanahay inaan ku darno +1 noode kooxda. Isla sidaas oo kale waa runta kutlada ka faa'iidaysiga liita.

Koodhka kore wuxuu tusaale u yahay shaqada ka saarta mishiinka kutlada oo ku joojiya daruuraha. Marka hore waxaa jira shaqo joojin YARN Nodemanager, ka dibna qaabku wuu shidaa Maintenance, ka dibna waxaan joojineynaa dhammaan adeegyada mashiinka oo aan daminno mashiinka farsamada ee daruuraha.

2. Kormeeraha qoraalka.py

Tusaalaha koodka halkaas:

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

Gudaha, waxaan ku hubineynaa in la sameeyay shuruudo lagu kordhinayo awooda kooxda iyo haddii ay jiraan mashiino kayd ah, hel magaca martida loo yahay mid ka mid ah, ku dar kooxda oo ku daabac fariin ku saabsan kooxdayada Slack. Kadibna way bilaabataa cooldown_period, marka aynaan ku darin ama ka saarin wax kutlada, laakiin si fudud ula socon culayska. Haddii ay dejisay oo ay ku dhex jirto marinka qiyamka culeyska ugu fiican, markaa waxaan si fudud u sii wadeynaa la socodka. Haddii hal nood uusan ku filneyn, markaas waxaan ku darnaa mid kale.

Kiisaska marka aan haysanno cashar ka hor, waxaan horeyba u ogaannay in hal node uusan ku filneyn, markaa waxaan isla markiiba bilaabeynaa dhammaan qanjidhada bilaashka ah oo aan sii wadno firfircoon ilaa dhammaadka casharka. Tani waxay dhacdaa iyada oo la isticmaalayo liiska wakhtiyada waxqabadka.

gunaanad

Autoscaler waa xal wanaagsan oo ku habboon kiisaskaas marka aad la kulanto rarka kutlada aan sinnayn. Isla mar ahaantaana waxaad ku guulaysataa habaynta kooxda la rabo ee culaysyada ugu sarreeya isla markaana ha sii haysan kooxdan inta lagu jiro rarka, lacag badbaadinaysa. Hagaag, waxaa dheer in tani ay si toos ah u dhacayso ka qaybqaadashadaada la'aanteed. Autoscaler lafteedu maaha wax ka badan codsiyo loo diray maamulaha kooxda API iyo bixiyaha daruuraha API, oo loo qoray si waafaqsan caqli-gal gaar ah. Waxa hubaal ah inaad u baahan tahay inaad xasuusato waa u qaybinta qanjidhada 3 nooc, sidaan hore u qornay. Waadna faraxsanaan doontaa.

Source: www.habr.com

Add a comment