Pehea e hana ai i kāu autoscaler no kahi hui

Aloha! Hoʻomaʻamaʻa mākou i nā kānaka e hana me ka ʻikepili nui. ʻAʻole hiki ke noʻonoʻo i kahi papahana hoʻonaʻauao ma nā ʻikepili nui me ka ʻole o kāna puʻupuʻu ponoʻī, kahi e hana pū ai nā poʻe a pau. No kēia kumu, loaʻa mau i kā mākou papahana :) Ke komo nei mākou i kāna hoʻonohonoho, hoʻokani a me ka hoʻokele, a hoʻomaka pololei nā kāne i nā hana MapReduce ma laila a hoʻohana iā Spark.

Ma kēia pou e haʻi mākou iā ʻoe pehea mākou i hoʻoponopono ai i ka pilikia o ka hoʻouka ʻana i ka puʻupuʻu ʻole ma ke kākau ʻana i kā mākou autoscaler me ka hoʻohana ʻana i ke ao. Mail.ru Cloud Solutions.

pilikia

ʻAʻole hoʻohana ʻia kā mākou pūʻulu ma kahi ʻano maʻamau. ʻAʻole kūlike loa ka hoʻolei ʻana. Eia kekahi laʻana, aia nā papa hana, i ka wā e hele ai nā kānaka 30 a me kahi kumu i ka hui a hoʻomaka e hoʻohana. A i ʻole, aia kekahi mau lā ma mua o ka lā palena ke piʻi nui ka ukana. ʻO ke koena o ka manawa e hana ai ka pūʻulu ma ke ʻano underload.

ʻO ka hopena #1, ʻo ia ka mālama ʻana i kahi puʻupuʻu e kūpaʻa i nā haʻahaʻa kiʻekiʻe, akā e hana ʻole i ke koena o ka manawa.

ʻO ka hoʻonā #2 ka mālama ʻana i kahi puʻupuʻu liʻiliʻi, kahi āu e hoʻohui lima ai i nā nodes ma mua o nā papa a i ka wā o nā haʻahaʻa kiʻekiʻe.

ʻO ka hopena #3 ka mālama ʻana i kahi puʻupuʻu liʻiliʻi a kākau i kahi autoscaler e nānā i ka ukana o kēia manawa o ka pūʻulu a, me ka hoʻohana ʻana i nā API like ʻole, hoʻohui a wehe i nā node mai ka pūʻulu.

Ma kēia pou e kamaʻilio mākou e pili ana i ka hopena #3. Ke hilinaʻi nui nei kēia autoscaler i nā kumu o waho ma mua o nā mea i loko, a ʻaʻole hāʻawi pinepine nā mea hoʻolako. Hoʻohana mākou i ka Mail.ru Cloud Solutions cloud infrastructure a kākau i kahi autoscaler me ka MCS API. A ʻoiai mākou e aʻo pehea e hana me ka ʻikepili, ua hoʻoholo mākou e hōʻike pehea e hiki ai iā ʻoe ke kākau i kahi autoscaler like no kāu mau kumu ponoʻī a hoʻohana me kāu ao.

elue

ʻO ka mea mua, pono ʻoe i kahi hui Hadoop. No ka laʻana, hoʻohana mākou i ka puʻunaue HDP.

I mea e hoʻohui a wehe koke ʻia kāu mau node, pono e loaʻa iā ʻoe kekahi mahele o nā kuleana ma waena o nā nodes.

  1. Pupu kumu. ʻAe, ʻaʻohe mea e pono ai e wehewehe ma aneʻi: ʻo ka node nui o ka pūʻulu, kahi, no ka laʻana, hoʻokuʻu ʻia ka mea hoʻokele Spark, inā ʻoe e hoʻohana i ka mode interactive.
  2. Node lā. ʻO kēia ka node kahi āu e mālama ai i ka ʻikepili ma HDFS a kahi e helu ai.
  3. Helu helu. ʻO kēia kahi node kahi āu e mālama ʻole ai i kekahi mea ma HDFS, akā kahi e helu ai.

Mea nui. E hana ʻia ka autoscaling ma muli o nā nodes o ke ʻano ʻekolu. Inā hoʻomaka ʻoe e lawe a hoʻohui i nā node o ke ʻano ʻelua, e haʻahaʻa loa ka wikiwiki o ka pane - ʻo ka decommissioning a me ka hoʻopaʻa hou ʻana he mau hola ma kāu puʻupuʻu. ʻO kēia, ʻoiaʻiʻo, ʻaʻole ia ka mea āu e manaʻo ai mai ka autoscaling. ʻO ia hoʻi, ʻaʻole mākou e hoʻopā i nā nodes o nā ʻano mua a me ka lua. E hōʻike ana lākou i kahi puʻupuʻu liʻiliʻi liʻiliʻi e ola i loko o ka lōʻihi o ka papahana.

No laila, kākau ʻia kā mākou autoscaler ma Python 3, hoʻohana i ka API Ambari e hoʻokele i nā lawelawe cluster, hoʻohana API mai Mail.ru Cloud Solutions (MCS) no ka hoʻomaka ʻana a me ka hoʻokuʻu ʻana i nā mīkini.

Hoʻolālā hoʻonā

  1. Module autoscaler.py. Loaʻa iā ia ʻekolu mau papa: 1) nā hana no ka hana ʻana me Ambari, 2) nā hana no ka hana ʻana me MCS, 3) nā hana e pili pono ana i ka loiloi o ka autoscaler.
  2. Palapala observer.py. ʻO ka mea nui aia nā lula like ʻole: i ka manawa a me nā manawa hea e kāhea ai i nā hana autoscaler.
  3. faila hoʻonohonoho config.py. Aia, no ka laʻana, he papa inoa o nā nodes i ʻae ʻia no ka autoscaling a me nā ʻāpana ʻē aʻe e pili ana, no ka laʻana, pehea ka lōʻihi o ka kali ʻana mai ka manawa i hoʻohui ʻia ai kahi node hou. Aia kekahi mau kaha manawa no ka hoʻomaka ʻana o nā papa, no laila, ma mua o ka papa e hoʻokuʻu ʻia ka hoʻonohonoho hui pū ʻana i ʻae ʻia.

E nānā kākou i nā ʻāpana code i loko o nā faila mua ʻelua.

1. Autoscaler.py module

Papa Ambari

ʻO kēia ke ʻano o kahi ʻāpana code i loaʻa kahi papa Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

Ma luna, ma keʻano he laʻana, hiki iāʻoe ke nānā i ka hoʻokōʻana i ka hana stop_all_services, ka mea e hooki ai i na lawelawe a pau ma ka node huikau i makemakeia.

Ma ka puka o ka papa Ambari hele ʻoe:

  • ambari_url, no ka laʻana, like 'http://localhost:8080/api/v1/clusters/',
  • cluster_name - ka inoa o kāu hui ma Ambari,
  • headers = {'X-Requested-By': 'ambari'}
  • a i loko auth eia kāu kau inoa a me kāu ʻōlelo huna no Ambari: auth = ('login', 'password').

ʻO ka hana ponoʻī ʻaʻole ia ma mua o nā kelepona ʻelua ma o ka REST API iā Ambari. Mai ka manaʻo kūpono, loaʻa mua iā mākou kahi papa inoa o nā lawelawe e holo ana ma kahi node, a laila e noi ma kahi hui i hāʻawi ʻia, ma kahi node i hāʻawi ʻia, e hoʻololi i nā lawelawe mai ka papa inoa i ka mokuʻāina. INSTALLED. Nā hana no ka hoʻomaka ʻana i nā lawelawe āpau, no ka hoʻololi ʻana i nā nodes i ka mokuʻāina Maintenance etc. like like - he mau noi liʻiliʻi wale nō lākou ma o ka API.

Papa Mcs

ʻO kēia ke ʻano o kahi ʻāpana code i loaʻa kahi papa Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

Ma ka puka o ka papa Mcs Hoʻokomo mākou i ka id papahana i loko o ke ao a me ka mea hoʻohana id, a me kāna ʻōlelo huna. Ma ka hana vm_turn_on makemake mākou e hoʻā i kekahi o nā mīkini. ʻOi aku ka paʻakikī o ka loiloi ma aneʻi. I ka hoʻomaka ʻana o ke code, ʻekolu mau hana ʻē aʻe i kapa ʻia: 1) pono mākou e kiʻi i kahi hōʻailona, ​​2) pono mākou e hoʻololi i ka hostname i ka inoa o ka mīkini ma MCS, 3) kiʻi i ka id o kēia mīkini. A laila, hana mākou i kahi noi leka a hoʻomaka i kēia mīkini.

ʻO kēia ke ʻano o ka hana no ka loaʻa ʻana o kahi hōʻailona:

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

Papa autoscaler

Aia kēia papa i nā hana e pili ana i ka loina hana pono'ī.

ʻO kēia ke ʻano o kahi ʻāpana code no kēia papa:

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

ʻAe mākou i nā papa no ke komo ʻana. Ambari и Mcs, he papa inoa o nā nodes i ʻae ʻia no ka scaling, a me nā ʻāpana hoʻonohonoho node: hoʻomanaʻo a me cpu i hoʻokaʻawale ʻia i ka node ma YARN. Aia kekahi 2 mau ʻāpana kūloko q_ram, q_cpu, ʻo ia nā queues. Ke hoʻohana nei iā lākou, mālama mākou i nā waiwai o ka ukana cluster o kēia manawa. Inā ʻike mākou i nā minuke 5 i hala ua hoʻonui mau ʻia ka ukana, a laila hoʻoholo mākou e pono mākou e hoʻohui i ka +1 node i ka hui. Pela no ka hui underutilization state.

ʻO ke code ma luna nei he laʻana o kahi hana e wehe ai i kahi mīkini mai ka pūʻulu a hoʻopaʻa iā ia i ke ao. ʻO ka mua, aia kahi decommissioning YARN Nodemanager, a laila ho'āla ke ʻano Maintenance, a laila hoʻopau mākou i nā lawelawe āpau ma ka mīkini a hoʻopau i ka mīkini virtual i ke ao.

2. Script observer.py

Laʻana code mai laila mai:

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

I loko o ia mea, nānā mākou inā ua hana ʻia nā kūlana no ka hoʻonui ʻana i ka hiki o ka puʻupuʻu a inā he mau mīkini i mālama ʻia, e kiʻi i ka inoa inoa o kekahi o lākou, e hoʻohui i ka hui a hoʻolaha i kahi leka e pili ana iā ia ma kā mākou hui Slack. Ma hope o ka hoʻomaka ʻana cooldown_period, inā ʻaʻole mākou e hoʻohui a wehe paha i kekahi mea mai ka pūʻulu, akā e nānā wale i ka ukana. Inā ua kūpaʻa ʻo ia a aia i loko o ke ala o nā koina ukana maikaʻi loa, a laila hoʻomau mākou i ka nānā ʻana. Inā ʻaʻole lawa kekahi node, a laila hoʻohui mākou i kekahi.

No nā hihia i loaʻa iā mākou kahi haʻawina ma mua, ua ʻike maopopo mākou ʻaʻole lawa ka node, no laila hoʻomaka koke mākou i nā node manuahi āpau a mālama iā lākou a hiki i ka hopena o ka haʻawina. Hana kēia me ka papa inoa o nā kaha manawa hana.

hopena

ʻO Autoscaler kahi hopena maikaʻi a kūpono hoʻi no kēlā mau hihia ke ʻike ʻoe i ka hoʻouka ʻana i ka hui. Loaʻa iā ʻoe i ka manawa like ka hoʻonohonoho cluster makemake no nā haʻahaʻa kiʻekiʻe a ma ka manawa like ʻole e mālama i kēia pūʻulu i ka wā o ka hoʻouka ʻana, e mālama i ke kālā. ʻAe, hoʻohui ʻia kēia mau mea āpau me ka ʻole o kou komo ʻana. ʻO ka autoscaler ponoʻī he mea ʻē aʻe ma mua o kahi hoʻonohonoho o nā noi i ka cluster manager API a me ka API hāʻawi kapua, i kākau ʻia e like me kekahi loiloi. ʻO ka mea āu e hoʻomanaʻo pono ai, ʻo ia ka mahele o nā nodes i 3 mau ʻano, e like me kā mākou i kākau mua ai. A e hauʻoli ʻoe.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka